Calculating average for every Nth line in the Nth column

ncwxpanther · April 12, 2012, 8:17am

Is there an awk script that can easily perform the following operation?

I have a data file that is in the format of

1944-12,5.6
1945-01,9.8
1945-02,6.7
1945-03,9.3
1945-04,5.9
1945-05,0.7
1945-06,0.0
1945-07,0.0
1945-08,0.0
1945-09,0.0
1945-10,0.2
1945-11,10.5
1945-12,22.3
1946-01,35.2
1946-02,13.4

I need to find the average of the values contained within -01, -02, and -03.

For instance the average of

1945-01,9.8
1945-02,6.7
1945-03,9.3

Would output

1945-03, 8.6

Any help would be appreciative.
Thanks!

CarloM · April 12, 2012, 8:32am

I'm not clear on whether your identifying column is '1945-05', '1945', or just '05'?

You could do something like:

awk -F, '{totals[$1]=+$2;counts[$1]++} END {for (i in totals) { print i, totals/counts}}' file

(1945-05)

Or:

awk -F"-|," '{totals[$2]=+$3;counts[$2]++} END {for (i in totals) { print i, totals/counts}}' file

(05 - $1 instead of $2 for 1945)

zaxxon · April 12, 2012, 8:38am

I guess the 1944 etc. is the important identifier? Using -1,-2,-3 just to filter the relevant lines:

awk -F"[,-]" '/-0[123],/ {a[$1]+=$NF; c[$1]++} END{for(e in a)print e", "a[e]/c[e]}' infile
1945, 8.6
1946, 24.3

ncwxpanther · April 12, 2012, 8:56am

Thanks for the help !

zaxxon's script worked great.