awk- looping through groups of lines

acsg · May 6, 2011, 7:16am

Hello,

I'm working with a file that has three columns. The first one represents a certain channel and the third one a timestamp (second one is not important). Example input is as follows:

2513   12   10.771
 2513   13   10.771
 2513   14   10.771
 2513   15   10.771
 2644    8    10.771
 2645   14    10.771
 2647     7    10.771
----------------------
 2513     0    10.772
 2513     1    10.772
 2513     2    10.772
 2513     3    10.772
 2513     4    10.772
 2513     5    10.772
 2513     6    10.772
----------------------
 2513     7    10.772
 2513     8    10.772
 2513     9    10.772
 2513     10    10.772
 2513     11  10.772
 2513     12   10.772
 2513     13   10.772

The input doesn't have the "----------------------" part, I just put it there so the groups of lines that I want to analyze become a bit clearer.

I want to analyze the lines by groups of 7 (since 7 same timestamps represent 1 packet). The problem is that the timestamps repeat themselves from time to time, so for example sometimes you might find 14 or 21 consecutive timestamps with the same value (even though values in the other two columns do vary). What I want to get is a count of the times that the first column values (channels) appear (only counted once per packet, so, every group of 7 lines).

Desired output:

The code I've tried so far doesn't consider the repeated fields (the groups of 7), so it only counts one time per timestamp (which means I get a value of 2 instead of 3 for channel 2513):

 awk '{ 
                          while (getline > 0 && NF > 0){
                           timec= $3;
                           pidc= $1;
                           if(timec == $3 && pidc != pidp){
                               pid[$1]++;
                             }
                           pidp=$1}
                           } 
                           END {for (i in pid){ print i, pid}}'

Any help is much appreciated.
Thanks!

bartus11 · May 6, 2011, 7:36am

I think you want the first line to say:

2513 4

Also post desired output for the rest of that sample data (10.772 timestamp).

acsg · May 6, 2011, 7:43am

Hello,

The desired output is for the whole input... meaning that I want to count the fact that, for example, channel 2513, appears in all 3 'packets' (groups of 7 lines).

Scrutinizer · May 6, 2011, 9:45am

Like this?

awk '{B[$1]} !(NR%7){for(i in B){delete B;A++}} END{for(i in A)print i,A}' infile

bartus11 · May 6, 2011, 9:55am

Try:

perl -lane '$x=int(($.-1)/7);$a{$x}{$F[0]}=1;END{for $i (keys %a){for $j (keys %{$a{$i}}){$b{$j}++}};for $i (keys %b){print "$i $b{$i}"}}' file

acsg · May 9, 2011, 2:40am

Thank you!! This seems to do the trick but I don't quite understand how it does it... could you please explain what the !(NR%7) is for? and why did you use the 'delete' ?