Average score

 awk '{if(len==0){last=$4;total=$6;len=1;getline}if($4!=last){printf("%s\t%f\n", last, total/len);last=$4;total=$6;len=1}else{total+=$6;len+=1}}END{printf("%s\t%f\n", last, total/len)}' exon.txt > output.txt 

In the attached file I am just trying to group all the same names in column $4 and then average them using the scores in $5, then output a file with text. Thanks :).

So in the file if CHD6 only appeared 2 times:

 chr20	40079730	40079774	A_16_P34711167	CHD6	0.5198
chr20	40079588	40079638	A_16_P34711162	CHD6	0.5806 

then the output.txt would be

 CHD6 occurs 2 times with an average of 0.5529 

I do not get the same average you do. I get 0.5502. What function are you using to calculate average?

Otherwise:

awk '{ N[$5]++ ; T[$5]+=$6 } END { for(X in N) printf("%s occurs %d times with an average of %f\n", X, N[X], T[X]/N[X]); }' inputfile
1 Like

I copied the numbers incorrectly. The script works great and I got the same # as you. Thank you :).

1 Like