I have a file with 20 columns. I'd like to retain only the lines for which the values in at least x columns, looking only at columns 6-20, are above a threshold.
For example, I'd like to retain only the lines in the file below that have at least 8 columns (again, looking only at columns 6-20) with the value of at least 0.75. (I would like to be able to easily modify the code so that I could play around with the number of minimum columns (8 in this case) as well as the threshold (0.75)).
File:
s_20331 822 1 1.000 5.0 0.00000000 0.14395044 0.00000000 0.00000000 0.00000000 0.20102041 0.00000000 0.00000000 0.00000000 0.28091837 0.11224490 0.03571429 0.00000000 0.00000000 0.00000000
s_20416 154 1 1.000 5.0 0.00000000 1.00000000 0.66666667 0.40000000 0.30216165 1.00000000 0.66666667 0.45142857 0.35714286 0.11111111 0.32659933 0.55245256 0.17424242 0.32832080 0.10345717
s_20476 114 1 1.000 5.0 0.00000000 1.00000000 0.42857143 0.85100619 1.00000000 1.00000000 0.42857143 0.86996904 1.00000000 0.25000000 0.13039843 0.00000000 0.19697069 0.25000000 0.10607391
s_20477 162 1 1.000 6.0 0.20987654 0.79423868 0.81481481 0.78395062 0.77777778 1.00000000 1.00000000 1.00000000 1.00000000 0.00000000 0.00000000 0.00000000 0.00000000 0.00000000 0.00000000
Output:
s_20477 162 1 1.000 6.0 0.20987654 0.79423868 0.81481481 0.78395062 0.77777778 1.00000000 1.00000000 1.00000000 1.00000000 0.00000000 0.00000000 0.00000000 0.00000000 0.00000000 0.00000000
I'm a novice and all I have so far is an awk command to set a threshold in individual columns, and then pipe that to another awk command screening another column. This obviously is inelegant as well as ineffective for allowing some columns to remain below the threshold.
awk '{if($6>=0.75)print;}' | awk '{if($7>=0.9)print;}' | awk '{if($8>=0.9)print;}' | awk '{if($9>=0.9)print;}' [...etc]