Averaging data every 30 mins using AWK

A happy Monday to you all,

I have a .csv file which contains data taken every 5 seconds. I want to average these 5 second data points into 30 minute averages!

date                          co2

25/06/2011 08:04        8.31
25/06/2011 08:04        8.32
25/06/2011 08:04        8.33
25/06/2011 08:04        8.32
25/06/2011 08:04        8.31
25/06/2011 08:04        8.30
25/06/2011 08:04        8.29
25/06/2011 08:04        8.34
25/06/2011 08:04        8.43
25/06/2011 08:04        8.33
25/06/2011 08:04        8.32
25/06/2011 08:04        8.30
25/06/2011 08:05        8.31
25/06/2011 08:05        8.32
25/06/2011 08:05        8.30
25/06/2011 08:05        8.29
25/06/2011 08:05        8.30
25/06/2011 08:05        8.31
25/06/2011 08:05        8.33
25/06/2011 08:05        8.32
25/06/2011 08:05        8.32
25/06/2011 08:05        8.35
25/06/2011 08:05        8.34
25/06/2011 08:05        8.36
25/06/2011 08:06        8.37
25/06/2011 08:06        8.34
25/06/2011 08:06        8.35
25/06/2011 08:06        8.35
25/06/2011 08:06        8.34
25/06/2011 08:06        8.38
25/06/2011 08:06        8.35
25/06/2011 08:06        8.36
25/06/2011 08:06        8.32
25/06/2011 08:06        8.31
25/06/2011 08:06        8.30
25/06/2011 08:06        8.32

Is there any way of doing this in AWK?

Cheers!

Check if it is giving you desired results:

awk '{a[int((NR-1)/720)]+=$3};END{for (i=0;i<NR/720;i++){print a/720}}' file

Ah good to hear from you again Bartus!

I should have specified sorry, there are two columns I want to average, a 'CO2' column and a 'CH4' column, both in columns 12 and 14 respectively.

Also when I put in that code, the Shell just replied with multiple '0's.

Try:

awk 'NR>2' file | awk '{a[int((NR-1)/720)]+=$12;b[int((NR-1)/720)]+=$14};END{print "CO2 CH4";for (i=0;i<NR/720;i++){print a/720,b/720}}'

hmm now the Shell outputs two columns of 0's, and the .csv file is unaltered.

Can you upload the file that you are trying to analyze?

I've attached just a snippet of the file. Hope it helps.

After you export that file to CSV format delimited by ";", try this:

awk 'NR>1' june2tmp.csv | awk -F\; '{a[int((NR-1)/720)]+=$13;b[int((NR-1)/720)]+=$15};END{print "CO2 CH4";for (i=0;i<NR/720;i++){print a/720,b/720}}'

I exported it to a csv and changed the delimiter, but still didn't change the file at all. Did it work for you?

It is not meant to change the file. It should print calculated averages on the screen.

Oh my bad. It doesn't do that either, just prints 'CO2 CH4'. Is there anyway to change the file? As that is the hope in averaging to 30 minutes, that the .csv file gets edited.

Well, first we need to get the proper output. When we have that, we can redirect it to some other file and move it over the original. I'll get back to you soon, cause for now I have to leave :wink:

ok no worries, thanks for the help!

This is what I get:

[root@rhel ~]# head -5 june2temp.csv
"date";"alarm";"species";"solenoid";"mpv";"outlet";"cavp";"cavt";"warmbox";"etalon";"dastemp";"co2sync";"co2";"ch4sync";"ch4";"h2osync"
11-06-25 08:03;0;1,00E+00;0,00E+00;0,00E+00;3,10E+04;1,40E+02;4,50E+01;4,50E+01;4,50E+01;3,18E+01;3,89E+02;3,95E+02;3,86E+00;3,91E+00;9,40E-01
11-06-25 08:03;0;2,00E+00;0,00E+00;0,00E+00;3,10E+04;1,40E+02;4,50E+01;4,50E+01;4,50E+01;3,18E+01;3,89E+02;3,95E+02;3,85E+00;3,90E+00;9,42E-01
11-06-25 08:03;0;3,00E+00;0,00E+00;0,00E+00;3,10E+04;1,40E+02;4,50E+01;4,50E+01;4,50E+01;3,18E+01;3,89E+02;3,95E+02;3,86E+00;3,91E+00;9,34E-01
11-06-25 08:03;0;2,00E+00;0,00E+00;0,00E+00;3,10E+04;1,40E+02;4,50E+01;4,50E+01;4,50E+01;3,18E+01;3,89E+02;3,95E+02;3,88E+00;3,95E+00;9,34E-01
[root@rhel ~]# awk 'NR>1' june2temp.csv | awk -F\; '{a[int((NR-1)/720)]+=$13;b[int((NR-1)/720)]+=$15};END{print "CO2 CH4";for (i=0;i<NR/720;i++){print a/720,b/720}}'
CO2 CH4
3 3
3 3
3.00278 3
2.45833 2.45833

Hmmm, it seems that numbers in this format are not properly processed by AWK. This code gives better results (it replaces commas with dots in fields 13 and 15):

[root@rhel ~]# awk 'NR>1' june2temp.csv | awk -F\; '{sub(",",".",$13);sub(",",".",$15);a[int((NR-1)/720)]+=$13;b[int((NR-1)/720)]+=$15};END{print "CO2 CH4";for (i=0;i<NR/720;i++){print a/720,b/720}}'
CO2 CH4
395 3.89621
395.335 3.87851
398.969 3.77253
322.117 3.14147

For some odd reason I'm still getting

CO2 CH4
0      0
0      0
0      0
0      0
0      0
0      0
0      0
0      0
0      0

Did you say there was a way to edit the file itself, as opposed to printing out the result on the screen?

Cheers

Can you post output of:

cat -Te file.csv | head
date,alarm,species,solenoid,mpv,outlet,cavp,cavt,warmbox,etalon,dastemp,co2sync,co2,ch4sync,ch4,h2osync$
25/06/2011 08:03:20 ,0,1.0000000000E+000,0.0000000000E+000,0.0000000000E+000,3.1033031768E+004,1.3998939700E+002,4.4999687195E+001,4.4999909923E+001,4.4982554694E+001,3.1751035912E+001,3.8940279934E+002,3.9515856123E+002,3.8618224512E+000,3.9114643916E+000,9.4040946797E-001$
25/06/2011 08:03:25 ,0,2.0000000000E+000,0.0000000000E+000,0.0000000000E+000,3.1036267092E+004,1.3999910014E+002,4.4999724978E+001,4.5000055343E+001,4.4982432169E+001,3.1750000000E+001,3.8935193355E+002,3.9511128829E+002,3.8489583767E+000,3.8990622477E+000,9.4172965077E-001$
25/06/2011 08:03:30 ,0,3.0000000000E+000,0.0000000000E+000,0.0000000000E+000,3.1035811523E+004,1.4000220783E+002,4.4999819829E+001,4.5000161391E+001,4.4982414246E+001,3.1795673077E+001,3.8938861262E+002,3.9514732622E+002,3.8595829527E+000,3.9062814635E+000,9.3441447742E-001$
25/06/2011 08:03:35 ,0,2.0000000000E+000,0.0000000000E+000,0.0000000000E+000,3.1036211602E+004,1.4000074680E+002,4.4999892989E+001,4.5000205994E+001,4.4982414246E+001,3.1750000000E+001,3.8945211683E+002,3.9517328822E+002,3.8841252351E+000,3.9451982461E+000,9.3417578630E-001$
25/06/2011 08:03:40 ,0,1.0000000000E+000,0.0000000000E+000,0.0000000000E+000,3.1036852113E+004,1.3999663742E+002,4.4999757503E+001,4.5000099772E+001,4.4982477978E+001,3.1793508287E+001,3.8936261353E+002,3.9511123990E+002,3.8395527261E+000,3.8931575825E+000,9.4752583244E-001$
25/06/2011 08:03:45 ,0,2.0000000000E+000,0.0000000000E+000,0.0000000000E+000,3.1032849098E+004,1.3998770158E+002,4.4999748230E+001,4.5000053406E+001,4.4982505798E+001,3.1750000000E+001,3.8931505434E+002,3.9512509568E+002,3.8351808367E+000,3.8599191928E+000,9.5367870751E-001$
25/06/2011 08:03:50 ,0,1.0000000000E+000,0.0000000000E+000,0.0000000000E+000,3.1039831891E+004,1.4001021926E+002,4.5000107359E+001,4.4999785829E+001,4.4982363833E+001,3.1750000000E+001,3.8929289686E+002,3.9508579576E+002,3.8474802185E+000,3.8609535036E+000,9.4079656257E-001$
25/06/2011 08:03:55 ,0,1.0000000000E+000,0.0000000000E+000,0.0000000000E+000,3.1034918787E+004,1.3999495360E+002,4.5000032095E+001,4.4999813527E+001,4.4982444763E+001,3.1750000000E+001,3.8929209181E+002,3.9514661440E+002,3.8572828460E+000,3.9132254247E+000,9.5624024001E-001$
25/06/2011 08:04:00 ,0,1.0000000000E+000,0.0000000000E+000,0.0000000000E+000,3.1035897944E+004,1.3999774824E+002,4.4999900818E+001,4.4999778748E+001,4.4982444763E+001,3.1750000000E+001,3.8899351438E+002,3.9493622327E+002,3.8126918495E+000,3.7581951076E+000,9.7192541150E-001$


Try this:

awk 'NR>1' file.csv | awk -F"," '{a[int((NR-1)/720)]+=$13;b[int((NR-1)/720)]+=$15};END{print "CO2 CH4";for (i=0;i<NR/720;i++){print a/720,b/720}}'

hey,

The averages were printed out to the shell so that's great, is there anyway of replacing the existing data in the csv file so the output looks like

25/06/2011 08:03:20   386.034    3.73601
25/06/2011 09:03:20   396.132    3.12456
25/06/2011 10:03:20   439.749    4.89633
25/06/2011 11:03:20   401.759    3.91748

cheers!