This matrix represents correlation values.
Is it possible to calculate the percentage of columns (a1, a2, a3) that have a value >= |0.5| and report the percentage that has positive correlation >0.5 and negative correlation <-0.5 separately. thanx in advance!
input
name a1 a2 a3
g1 0.8 0.4 0.2
g2 -0.2 -0.6 -0.7
g3 0.1 0.6 0.8
g4 0.1 0 0
g5 -0.2 -0.2 -0.2
g6 -0.1 -0.9 -0.9
g7 0 0 0.2
i am doing this manually like the following but the no of columns i have are around 5000.
awk '{ sum+=$2} END {print sum}' rm
0.5
awk '{ sum+=$3} END {print sum}' rm
-0.7
awk '{ sum+=$4} END {print sum}' rm
-0.6
(1/3)*100=33.33% (positive corr)
(2/3)*100=66.66% (negative corr)
anbu23
April 9, 2014, 9:05am
4
Modify the code to your desired output
$ awk ' NR > 1 { for(i=1;i<=NF;i++) arr+=$i }
> END { for(i=1;i<=NF;i++) { sub("-","",arr); if( arr > 0.5) print "Col"i ":" arr } } ' file
Col2:0.5
Col3:0.7
Col4:0.6
thanx a lot. is it possible to keep the column name a1,a2, a3 along with positive or negative values ?
ex:
a1:0.5
a2:-0.7
a3:-0.6
anbu23
April 9, 2014, 9:19am
6
$ awk ' NR == 1 { for(i=1;i<=NF;i++) a=$i }
> NR > 1 { for(i=1;i<=NF;i++) arr+=$i }
> END { for(i=1;i<=NF;i++) { if( arr > 0 ? arr : -1 * arr > 0.5) print a ":" arr } } ' file
a1:0.5
a2:-0.7
a3:-0.6
1 Like