Calculate percentage of columns greater than certain value in a matrix using awk

This matrix represents correlation values.
Is it possible to calculate the percentage of columns (a1, a2, a3) that have a value >= |0.5| and report the percentage that has positive correlation >0.5 and negative correlation <-0.5 separately. thanx in advance!

input

name	a1	a2	a3
g1	0.8	0.4	0.2
g2	-0.2	-0.6	-0.7
g3	0.1	0.6	0.8
g4	0.1	0	0
g5	-0.2	-0.2	-0.2
g6	-0.1	-0.9	-0.9
g7	0	0	0.2

What have you tried?

i am doing this manually like the following but the no of columns i have are around 5000.

awk '{ sum+=$2} END {print sum}' rm
0.5
awk '{ sum+=$3} END {print sum}' rm
-0.7
awk '{ sum+=$4} END {print sum}' rm
-0.6
(1/3)*100=33.33% (positive corr)
(2/3)*100=66.66% (negative corr) 

Modify the code to your desired output

$ awk ' NR > 1 { for(i=1;i<=NF;i++) arr+=$i }
> END { for(i=1;i<=NF;i++) { sub("-","",arr); if( arr > 0.5) print "Col"i ":" arr } } ' file
Col2:0.5
Col3:0.7
Col4:0.6

thanx a lot. is it possible to keep the column name a1,a2, a3 along with positive or negative values ?
ex:

a1:0.5
a2:-0.7
a3:-0.6
$ awk ' NR == 1 { for(i=1;i<=NF;i++) a=$i }
> NR > 1 { for(i=1;i<=NF;i++) arr+=$i }
> END { for(i=1;i<=NF;i++) { if( arr > 0 ? arr : -1 * arr > 0.5) print a ":" arr } } ' file
a1:0.5
a2:-0.7
a3:-0.6
1 Like