Averaging 3 files

Hi,

I am trying to average the values from 3 files with the same format. They are very large files so I will describe the file and show some it of. Basically the file has 83 columns (with nearly 7000 rows). The first three columns are the same for each file while the remaining 80 are values that differ (the ones I want averaged).

The file is tab separated.

The file looks like this:

File 1 (only showing 4 of the 80 columns and 1 row):

#geneid	strand	feature_type	up.1	        up.2	        up.3	        up.4
AAC1	        -	      CDS	                -0.383672	-0.470544	-0.024423	-0.179893

File 2 (format is the same as above but values differ):

#geneid	strand	feature_type	up.1	        up.2	        up.3	        up.4
AAC1	        -	        CDS	                -0.45433	-0.560544	-0.114423	-0.174582

File 3 (format is the same but values differ):

#geneid	strand	feature_type	up.1	        up.2	        up.3	        up.4
AAC1	        -	        CDS	                -0.283672	-0.570544	-0.624423	-0.669893

Output file (average values):

#geneid	strand	feature_type	up.1	        up.2	        up.3	        up.4
AAC1	        -	        CDS	                -0.373891	-0.533877	-0.254423	-0.341456

Thanks

Kyle

nawk -f kyl.awk OFS='\t' file1 file2 file3 fileN
kyl.awk:

FNR==1 { h=$0;next}
NR==2 {f3=$1 OFS $2 OFS $3}
{for(i=4;i<=NF;i++) s+=$i; nf=NF;nr=NR}
END {

  printf h ORS f3 OFS
  for(i=4;i<=nf;i++) printf("%.5f%c", s/nr, (i==nr)?ORS:OFS)
}
1 Like

Sorry it does work but for only 1 row. My file contains 6000 rows. Is there something to change in the code?

Thanks

sorry, I misunderstood what you were after...
kyl.awk:

FNR==1 { h=$0;f++;next}
NR==2 {f3=$1 OFS $2 OFS $3}
{for(i=4;i<=NF;i++) s[FNR,i]+=$i; nf=NF;fnr=FNR}
END {
  print h
  for (i=2;i<=fnr;i++) {
     printf f3 OFS
     for(j=4;j<=nf;j++) printf("%.5f%c", s[i,j]/f, (j==nf)?ORS:OFS)
  }
}
1 Like