The awk
below is supposed to count all the matching $5
strings and count how many $7
values is less than 20. I don't think I need the portion in bold as I do not need any decimal point or format, but can not seem to get the correct counts. Thank you :).
file
chr5 77316500 77316628 chr5:77316500-77316628 AP3B1 62 152
chr5 77316500 77316628 chr5:77316500-77316628 AP3B1 63 153
chr16 14041460 14042214 chr16:14041460-14042214 ERCC4 333 19
chr16 14041460 14042214 chr16:14041460-14042214 ERCC4 334 19
chr16 14041460 14042214 chr16:14041460-14042214 ERCC4 335 19
chr15 31196856 31198110 chr15:31196856-31198110 FAN1 5 62
chr15 31196856 31198110 chr15:31196856-31198110 FAN1 6 62
desired output
AP3B1 0
ERCC4 3
FAN1 0
awk with current output
awk '{sum[$5]+=$7 < 20; count[$5]++}
END{for(k in sum) printf "%s %.1f\n", k, sum[k]/count[k]}' file
AP3B1 0.0
ERCC4 1.0
FAN1 0.0