Sampling and Binning- Engineering problem

Hi everyone!

Can you please help me with some shell scripting?

I have an input file input.txt

It has 3 columns (Time, Event, Value)

Time event Value
03:38:22 A 57
03:38:23 A 56
03:38:24 B 24
03:38:25 C 51
03:38:26 B 7
03:38:26 B 59
03:38:27 A 98
03:38:28 A 24
03:38:29 A 35
03:38:30 A 55

Require code to do the following steps.

1) Sort by column 1 (Time) - Ascending order
2) Perform sampling every 5 second and within this 5 second block count the number event (event type ="A") and average the Value. Hence the output for the 1st 5 second block should be .

TBlocks Count_of_EVENT("A") Average Value

1 3 70.33

And continue for the 2nd 5 second block ( which starts from T= 03:38:28 ).... until end of the file .

Thanks for your support

#!/usr/bin/perl

use strict;
use warnings;

my ($t1, $sum, $count, $block);

while (<>) {
  chomp;
  my ($timestamp, $event, $value) = split (/ /);
  my ($h, $m, $s) = split (/:/, $timestamp);
  my $t = $s + 60*$m + 3600*$h;

  if (! defined $t1 || $t >= $t1) {
    if (defined $t1) {
      print ++$block, " ", $count, " ", $sum/$count, "\n";
    }
   $t1 = $t + 5;
    $sum = $count = 0;
  }
  if ($event eq "A") {
    ++$count;
    $sum += $value;
  }
}

if ($count) {
  print ++$block, " ", $count, " ", $sum/$count, "\n";
}

Assumes sorted input. I'm not entirely sure I correctly figured out what to count and average but I imagine you can straighten it out if it's not completely correct.

I assume you really meant five-second blocks (for which the first ends at 03:38:27.999999) and so the output is not precisely as you specified. Maybe change the interval to six if you really want 03:38:22 through 03:38:28.999999 in the first block.

hi era,

I am using cygwin. can you please help me how
on how to execute this program in Cygwin.

Also how can I use the input command ( Input file = "Input.txt")

This is my first time that I using cygwin and first time to run a script.

Thank you so much for your support

I'm not very familiar with Cygwin, but if you store the script in sample.pl you would simply run it with

A:\> sort -t : -n Input.txt | perl sample.pl >Output.txt

where A:\> is my possibly uninformed guess about what the Cygwin prompt looks like. (Actually I guess it's more like you@wintendo$ really.)

hi era,

thanks for your quick response.

i am getting one error.

"Illegal division by zero at cp.pl line 16, <> line 2."

line 16= print ++$block, " ", $count, " ", $sum/$count, "\n";:confused:

input file:

3:13:09 B 32
3:14:01 B 51 :confused:
3:14:03 A 100
3:20:00 A 77
3:20:01 A 22
3:20:02 A 44
3:20:03 A 35
3:20:03 B 17
3:20:04 B 2
3:20:05 A 65
3:20:06 B 51
3:20:07 A 100
3:20:08 A 77
3:20:09 A 22
3:20:10 A 44
3:20:10 A 35
3:20:11 B 17
3:20:12 B 2

hi era,

i found the problem.

the division by zeor error is happening
if the first row - "Event" is NOT equal to "A" , this is after sorting.

also you get an error if there are any blank lines at the end of the file.

any suggestion on how solve this.

thanks

hi era,

I am trying to modify the code so I can get the count and sum for every event type in one row. This with the five second block.

Wanted output:

Timestamps--CountA--CountB--SumA---SumB

Below is the modify code ( but it is not working):o

..can you please help.. :slight_smile:

thanks

#!/usr/bin/perl

use strict;
use warnings;

my ($t1, $sumA, $countA,$sumB, $countB, $block);

while (<>) {
chomp;
my ($timestamp, $event, $value) = split (/ /);
my ($h, $m, $s) = split (/:/, $timestamp);
my $t = $s + 60*$m + 3600*$h;

if (! defined $t1 || $t >= $t1) {
if (defined $t1) {
print ++$block, " ", $countA, " ", $sumA, " ", $countB, " ", $sumB, "\n";
}
$t1 = $t + 5;
$sumA = $countA = 0;
$sumB = $countB = 0;
}
if ($event eq "A") {
++$countA;
$sumA += $value;

if ($event eq "B") {
++$countB;
$sumB += $value;
}
}
}
if ($countA,$countB) {
print ++$block, " ", $countA, " ", $sumA, " ", $countB, " ", $sumB, "\n";

Here's a slightly modified version which copes with empty lines and multiple event counts. It prints the event label (for legibility), sum, count, and average for each of the selected events. You can add more events like C => 1 if you want to.

#!/usr/bin/perl

use strict;
use warnings;

my (%k, $t1, %sum, %count, $block) = (A => 1, B => 1);

sub report {
  print join (",", ++$block,
    map { $_, $sum{$_} || 0, $count{$_} || 0,
      $count{$_} ? $sum{$_} / $count{$_} : "" } keys %k), "\n";
}

while (<>) {
  chomp;
  my ($timestamp, $event, $value) = split (/ /);
  next unless $timestamp;
  my ($h, $m, $s) = split (/:/, $timestamp);
  my $t = $s + 60*$m + 3600*$h;

  if (! defined $t1 || $t > $t1) {
    report if defined $t1;
    $t1 = $t + 5;
    %sum = %count = ();
  }
  if ($k{$event}) {
    ++$count{$event};
    $sum{$event} += $value;
  }
}

report if %count;

The average field prints as empty if the count and sum are zero. Here's some sample output for the input you posted.

1,A,0,0,,B,32,1,32
2,A,100,1,100,B,51,1,51
3,A,243,5,48.6,B,19,2,9.5
4,A,278,5,55.6,B,68,2,34
5,A,0,0,,B,2,1,2