Computing average values from multiple text files

rbredereck · July 29, 2011, 1:18pm

Hi,
first, I have searched in the forum for this, but I could not find the right answer. (There were some similar threads, but I was not sure how to adapt the ideas.)

Anyway, I have a quite natural problem: Given are several text files. All files contain the same number of lines and the same number of columns. I want to compute a file that contains at each cell (interpreting the files as tables) the average value of the corresponding cells from my files.

There is a problematic thing in the files: The cells may contain numbers or the string "n/a" which means something like "not computed". When some files have a "n/a" at some cell, then the result file should contain the average from the not-"n/a" values and (separated by some unique symbol like "~") the number of "n/a" values.

For example:
File 1

1 2   3
1 n/a 3

File 2

3 2   1
3 n/a 1

File 3

2 2 n/a
5 2 5

Now, want to compute the following:

Resultfile

2 2   2~1
3 2~2 3

Of course, I could implement this with some high-level programming language, but having this as script would make it much more comfortable in my application.

I think this should be easy for experts of awk or similar tools. Unfortunately I don't see an easy solution.

Thanks in advance.

bartus11 · July 29, 2011, 1:52pm

You have to specify the number of files in this code (marked red):

awk '{for (i=1;i<=NF;i++){if ($i!~"n/a"){a[FNR""i]+=$i}else{b[FNR""i]++}}}END{for (i=1;i<=FNR;i++){for (j=1;j<=NF;j++){printf (a[i""j]/(3-b[i""j]))((b[i""j]>0)?"~"b[i""j]" ":" ")};printf "\n"}}' file1 file2 file3

radoulov · July 29, 2011, 2:08pm

And another one:

awk 'END {
  for (i = 0; ++i <= fnr;) 
    for (j = 0; ++j <= nf;)    
      printf "%s", (((i, j) in na ? \
        (v[i, j] / ((ARGC - 1) - na[i, j]) "~" na[i, j]) : \
	  v[i, j]/(ARGC - 1)) (j < nf ? FS : RS))
	}
{
  for (i = 0; ++i <= NF;)
    $i ~ /n\/a/ ? na[FNR, i]++ : v[FNR, i] += $i
  # FNR/NF may not be available
  # in the END block with some
  # awk implementations
  # so I,m saving them here
  nf = NF; fnr = FNR
  }' file*

You may need to add division by zero attempted exception handler?

rbredereck · August 1, 2011, 6:25am

It works perfectly!