How to perform averaging of values for particular timestamp using awk or anythoing else??

Saidul · September 30, 2013, 1:34pm

I have a file of the form.
16:00:26,83.33 16:05:26,83.33 16:10:26,83.33 16:15:26,83.33 16:20:26,90.26 16:25:26,83.33 16:30:26,83.33 17:00:26,83.33 17:05:26,83.33 17:10:26,83.33 17:15:26,83.33 17:20:26,90.26 17:25:26,83.33 17:30:26,83.33
For the timestamp 16:00:00 to 16:55:00, I need to average the values (Hourly) and print it in another csv file. similarly for 17:00:00 to 17:55:00 and so on.. I need 24 values for 24 hours.

CarloM · September 30, 2013, 1:52pm

16:00:00 to 16:00:30, or to 16:30:00? You only have 1 value every 5 minutes in your example, so 30 second averages would be a bit odd.

Assuming it's half-hourly, you could do something like:

$ awk '/^..:([012].|30)/{split($1,ti,":"); sum[ti[1]]+=$2;cnt[ti[1]]++} END {for (i in sum) {printf ("%d:xx:xx - %f\n", i, sum/cnt)}}' FS="," RS="[ \n]" file
16:xx:xx - 84.320000
17:xx:xx - 98.605714

(Note - I changed the last value to 183.33, just to make them different)

EDIT: To process every timestamp just remove the regex (assuming you have no other lines in the file - otherwise you'll need something like /^..:..:../ ).

Saidul · September 30, 2013, 2:13pm

Hi CarloM.
Thanks for your reply.
That was typo there. Script is suppose to take hourly averages from 16:00:00 to 16:55:00 and so on for the remaining timestamps.

I ran the script you have posted. It is running fine but the manual answers are not similar to the ones obtained by the script.

Can you please explain tha above.
Many thanks in advance.

CarloM · September 30, 2013, 2:38pm

What awk are you using? I get the same results manually & by script from your test data using GNU awk on cygwin.

RudiC · September 30, 2013, 2:52pm

Not sure what you need explained, so here's it all:

awk     '/^..:[0-5][05]/        {split($1,ti,":")               # if the record starts with any two chars:00 - 55 in 05 increments
                                                                #     split rec into ti
                                 SUM[ti[1]]+=$2                 # sum second field into the SUM array indexed by the first ti element (= hour)
                                 CNT[ti[1]]++                   # keep number of summands
                                }
         END                    {for (i in SUM) {               # after file has been processed
                                    printf ("%d:00:00;%f\n",    # print the index i which is the hour
                                            i, SUM/CNT)   # and the computed avg value
                                   }
                                }
        ' FS="," RS="[ \n]" file                                # field separator: comma; record sep: space or <newline>

Please be aware that CarloM's original proposal did compute just the avg for 00 - 30, not 00 - 55 as your edited spec required. That's why the the regex has to be modified, if you don't drop it entirely as suggested.

Saidul · October 1, 2013, 2:55am

Thank you so much CarloM and RudiC.
The above was really helpful.

---------- Post updated at 01:55 AM ---------- Previous update was at 01:19 AM ----------