average of rows with same value in the first column

paolo.kunder · August 29, 2012, 4:26am

Dear All,
I have this file tab delimited

I would like to make the average every column where the first column is the same, for example,

A	6,5	46,5	46,5
B	46,0	19,0	18,0
C	55,0	9,0	32,0
D	11,0	1,0	66,0
E	455,0	4,0	2,0

Any help?
thanks,
paolo

vbe · August 29, 2012, 4:39am

Yes, nice, and what have you done so far?

paolo.kunder · August 29, 2012, 4:42am

I have done it in excel but my file is huge, almost a million of rows, I can't go on by hand!
paolo

elixir_sinari · August 29, 2012, 4:56am

awk '{
c[$1]=1
for(i=2;i<=NF;i++)
{
 total[$1,i]+=$i
 count[$1,i]++
}
}
END{
for(i in c)
{
 printf "%s",i
 for(j=2;count[i,j];j++)
  printf "\t%.1f", (total[i,j]/count[i,j])
 printf "\n"
}
}' file

paolo.kunder · August 29, 2012, 5:17am

thanks but unfortunately is not working the script,

I found a way to sum up values with same ID, now I need to calculate the average:

awk '{a[$1]+=$2;b[$1]+=$3;c[$1]+=$4;d[$1]+=$5} END{for (i in a) print i,a,b,c,d}' file

cheers,
paolo

Franklin52 · August 29, 2012, 5:18am

Another approach:

awk '{a[$1]+=$2; b[$1]+=$3; c[$1]+=$4; n[$1]++}
END{for(i in a) printf("%s\t%.1f\t%.1f\t%.1f\n", i, a/n, b/n, c/n)}
' file

delugeag · August 29, 2012, 5:20am

with your code you only need to add a count line :

awk '{a[$1]+=$2;b[$1]+=$3;c[$1]+=$4;d[$1]+=$5;count[$1]+=1} END{for (i in a) print i,a/count,b/count,c/count,d/count}' file

elixir_sinari · August 29, 2012, 5:22am

cat testfile
A       1       12      22
B       3       34      33
C       55      9       32
A       12      81      71
D       11      1       66
E       455     4       2
B       89      4       3

awk '{
> c[$1]=1
> for(i=2;i<=NF;i++)
> {
>  total[$1,i]+=$i
>  count[$1,i]++
> }
> }
> END{
> for(i in c)
> {
>  printf "%s",i
>  for(j=2;count[i,j];j++)
>   printf "\t%.1f", (total[i,j]/count[i,j])
>  printf "\n"
> }
> }' testfile
A       6.5     46.5    46.5
B       46.0    19.0    18.0
C       55.0    9.0     32.0
D       11.0    1.0     66.0
E       455.0   4.0     2.0

What do you mean by "not working"?

paolo.kunder · August 29, 2012, 5:28am

thanks a lot!
paolo