Average of columns

I have files that have the following columns

chr    pos    ref    alt    sample 1    sample 2    sample 3
chr2    179644035    G    A    1,107    0,1    58,67
chr7    151945167    G    T    142,101    100,200    500,700
chr13    31789169    CTT    CT,C    6,37,8    0,0,0    15,46,89
chr22    50962208    T    G    1,107    1,10    0,0
chr23    4373957984    CTT    A,T,G,C    0,1,2,4    0,0,1,3    9,4,6,2

I need to take the average of the values comma separated for each sample rounded to 2 decimal point to have the following output

chr    pos    ref    alt    sample 1    sample 2    sample 3
chr2    179644035    G    A    54    0.5    62.5
chr7    151945167    G    T    121.5    150    600
chr13    31789169    CTT    CT,C    17    0    50
chr22    50962208    T    G    54    5.5    0
chr23    4373957984    CTT    A,T,G,C    1.75    1    5.25
 

Any guide in using awk to achieve this will be extremly helpful

Hello nans,

On forums we encourage users to do add their efforts which they have out in order to solve their own problems.
So kindly do add your efforts in your question and let us know then.

Thanks,
R. Singh

Hello R.Singh,

I haven't come anywhere near but this is what I have so far which is terribly wrong

awk '{a[FNR]=a[FNR]+$5;b[FNR]++;}END{for(i in a){print i,a/b}}' file.txt > output

I was working on trying to get the average on sample 1 only. Here I was intending to take the 5th column, separate them and then calculate its average but obviously this doesn't do that

Try

awk '
NR > 1  {for (i=5; i<=NF; i++)  {n   = split ($i, T, ",")
                                 for (j=1; j<=n; j++) SUM += T[j]
                                 $i  = SUM/n
                                 SUM = 0
                                }
        }
1
' OFS="\t" file
1 Like