Column means for multiple files

larrymuli · January 8, 2009, 9:42am

Hi all,

I have multiple dat files one for each day of the year and each with two lengthy columns. For each file, I wan't to perform a mathematical operation on the data of column 2 (red), then get the mean of these values and create a new two-column file where each row shows the mean value and the filename of the original file.

I came up with this but it won't work properly:

cat *dat | awk �{print sum+(2 * exp(4 * $2- 3))/NR ,FILENAME}' > newfilename

Any assistance would be much appreciated - I'm new to scripting

Many thanks

Larry

jim_mcnamara · January 8, 2009, 10:30am

NR is the number of records - across all input. FNR is for all files.

Do you want a mean per file or for all data? Show us a couple of lines of input and the desired output.

larrymuli · January 8, 2009, 10:41am

Sorry, I didn't make that very clear:

The files (e.g. 090108.dat) are like so:

2.34, 0.24
3.45, 0.45
4.81, 0.63
...etc

I want to do (0.24+0.45+0.63)/3 which is 0.44, then get x = 2*exp(4*0.44-3) the and place this in a new file:

x1, 090108
x2, 090107
x3, 090106

So NR represents the number of rows in each file.

I think my problem lies in the formatting of the sum+ function and the brackets

thanks

Larry

vgersh99 · January 8, 2009, 10:57am

something along these lines.
nawk -f larry.awk file1 file2 fileN

larry.awk:

FNR==1 && NR!=1{
  arr[file] = 2*exp(4*(sum/fnr)-3)
  sum=fnr=0
}

FNR==1 { file=FILENAME}
{
   sum+=$2
   fnr=FNR
}
END {
  arr[file] = 2*exp(4*(sum/fnr)-3)

  for (i in arr)
    print arr, i
}