Hi folks
I have a situation where I am trying to use awk to compute mean and standard deviation for a variable that spans across multiple files. The layout of each file is same and arranged in 3 columns and uses comma as a delimiter.
File1 layout:
col1,col2,col3
0,0-1,0.2345
1,1-2,0.3456
1,1-2,0.4567
2,2-3,0.5678
what I need to do is first scan each file (i have at least 200 files) and estimate the global mean of the third column for each index value given in the first colum over all files and then make a repeat pass to calculate the global standard deviation, again for each index value in the first column, over all files and using the global mean I calculated previously.
I thought of using awk for this as my file sizes are big and other scripting languages like Perl or ordinary bash are turning out to be too slow. I did a test and it seems awk can read these huge files line by line really quick but am stuck as to how to implement the actual stuff in awk.
Any help will be very useful.
Thanks