Dear All,
I have this file tab delimited
A 1 12 22
B 3 34 33
C 55 9 32
A 12 81 71
D 11 1 66
E 455 4 2
B 89 4 3
I would like to make the average every column where the first column is the same, for example,
A 6,5 46,5 46,5
B 46,0 19,0 18,0
C 55,0 9,0 32,0
D 11,0 1,0 66,0
E 455,0 4,0 2,0
Any help?
thanks,
paolo
vbe
2
Yes, nice, and what have you done so far?
I have done it in excel but my file is huge, almost a million of rows, I can't go on by hand!
paolo
awk '{
c[$1]=1
for(i=2;i<=NF;i++)
{
total[$1,i]+=$i
count[$1,i]++
}
}
END{
for(i in c)
{
printf "%s",i
for(j=2;count[i,j];j++)
printf "\t%.1f", (total[i,j]/count[i,j])
printf "\n"
}
}' file
thanks but unfortunately is not working the script,
I found a way to sum up values with same ID, now I need to calculate the average:
awk '{a[$1]+=$2;b[$1]+=$3;c[$1]+=$4;d[$1]+=$5} END{for (i in a) print i,a,b,c,d}' file
cheers,
paolo
Another approach:
awk '{a[$1]+=$2; b[$1]+=$3; c[$1]+=$4; n[$1]++}
END{for(i in a) printf("%s\t%.1f\t%.1f\t%.1f\n", i, a/n, b/n, c/n)}
' file
with your code you only need to add a count line :
awk '{a[$1]+=$2;b[$1]+=$3;c[$1]+=$4;d[$1]+=$5;count[$1]+=1} END{for (i in a) print i,a/count,b/count,c/count,d/count}' file
cat testfile
A 1 12 22
B 3 34 33
C 55 9 32
A 12 81 71
D 11 1 66
E 455 4 2
B 89 4 3
awk '{
> c[$1]=1
> for(i=2;i<=NF;i++)
> {
> total[$1,i]+=$i
> count[$1,i]++
> }
> }
> END{
> for(i in c)
> {
> printf "%s",i
> for(j=2;count[i,j];j++)
> printf "\t%.1f", (total[i,j]/count[i,j])
> printf "\n"
> }
> }' testfile
A 6.5 46.5 46.5
B 46.0 19.0 18.0
C 55.0 9.0 32.0
D 11.0 1.0 66.0
E 455.0 4.0 2.0
What do you mean by "not working"?