Average of a column in multiple files

kayak · September 4, 2012, 7:56am

I have several sequential files with name stat.1000, stat.1001....to stat.1020 with a format like this

0.01 1 3822 4.97379915032e-14 4.96982253992e-09 0
0.01 3822 1 4.97379915032e-14 4.96982253992e-09 0
0.01 2 502 0.00993165137406 993.165137406 0
0.01 502 2 0.00993165137406 993.165137406 0
0.01 4 33 0.00189645523539 189.645523539 0
0.01 33 4 0.00189645523539 189.645523539 0
0.01 4 548 0.00357382134942 357.382134942 0
0.01 548 4 0.00357382134942 357.382134942 0
0.01 4 1225 0.00088154447822 88.154447822 0
0.01 1225 4 0.00088154447822 88.154447822 0
0.01 4 868 0.00295939726649 295.939726649 0
0.01 868 4 0.00295939726649 295.939726649 0

I know I can find the average of the 9th column by using

awk '{sum+=$9} END {print sum/NR}'

But I would like to find the average of the 9th column in each file and put them in a new file (in that order). Can someone pls help with this problem?

pamu · September 4, 2012, 8:06am

Try something like this..

while read line
do
awk '{sum+=$9} END {print sum/NR}' $line >> file_new
done<file_list

radoulov · September 4, 2012, 8:06am

awk 'END { print sum / fnr }
NR > 1 && FNR == 1 {
  print sum/fnr
  sum = x
  }
{ 
  sum += $9 
  fnr = FNR
  }' > new_file file1 file2 ... filen

You may need to add 0 divisor handler.

With GNU awk >= 4 you could use the ENDFILE pattern:

{ sum += $9 }
ENDFILE { 
  print sum/FNR 
  sum = x
  }

kayak · September 21, 2012, 10:25am

Thanks radoulov,

I tried your solution but it doesnt seem to catch it. What I actually want to write the average of the 9th column in each of my stat files to a new file as a single column. I have 250 files in all (in the same folder) numbered stat.1000, stat 1001, stat.1002 ....stat.1250

Could you please help with this?

radoulov · September 21, 2012, 3:59pm

awk 'END { print sum / fnr > fn }
NR > 1 && FNR == 1 {
  print sum/fnr > fn
  close(fn)
  sum = x
  }
{ 
  sum += $5 
  fnr = FNR
  fn = "avg_" FILENAME
  }' stat.*

With GNU awk in most cases you don't need to bother with closing the output files.

Corona688 · September 21, 2012, 4:11pm

That's system-dependent, not awk-dependent. If you install gawk on a foreign system, that won't increase the system's maximum-open-files limit.

radoulov · September 21, 2012, 4:27pm

Sorry for not being clear. Some awk implementations have their own (rather low) open files limits:

bash-2.03$ uname -s
SunOS
bash-2.03$ ulimit -n
1024
bash-2.03$ awk '{ for (i = 0; ++i <= 1000;) print > i }' /etc/issue
awk: too many output files 10
 record number 1
bash-2.03$ nawk '{ for (i = 0; ++i <= 1000;) print > i }' /etc/issue
nawk: 21 makes too many open files
 input record number 1, file /etc/issue
 source line number 1
bash-2.03$ /usr/xpg4/bin/awk '{ for (i = 0; ++i <= 1000;) print > i }' /etc/issue
/usr/xpg4/bin/awk: line 0 (NR=1): output file "253": Too many open files