Trying to use awk
to calulate a percent based on the count of each matching $5
in file divided by the count of each $7
that is greater than or = to 20. The portion of code before the first |
gets the count of the matching $5
, then the next portion before the second |
gets the count of each $7
that is greater than or = to 20. The last part gets the overall %. The awk
does execute, but no output results and there probably is a bette way and hope my logic makes sense . Thank you :).
file
chr1 1787320 1787324 chr1:1787320-1787324 GNB1_1 1 394
chr1 1787320 1787324 chr1:1787320-1787324 GNB1_1 2 398
chr1 1787320 1787324 chr1:1787320-1787324 GNB1_1 3 17
chr1 1787320 1787324 chr1:1787320-1787324 GNB1_1 4 19
chr7 99203095 99203098 chr7:99203095-99203098 KPNA7_9 66 12
chr7 99203095 99203098 chr7:99203095-99203098 KPNA7_9 67 2
chr7 99203095 99203098 chr7:99203095-99203098 KPNA7_9 68 0
chrX 154370862 154370864 chrX:154370862-154370864 FLNA_26 375 0
chrX 154370862 154370864 chrX:154370862-154370864 FLNA_26 376 0
desired
GNB1_1 4 2 50.0%
KPNA7_9 3 2 33.3%
FLNA_26 2 0 0.0%
awk
awk -F '\t' '{c[$5]++}
END{
for (i in c) printf("%s\t%s\n",i,c)
}' file | awk 'count[$5]==""{ count[$5]=0 }
$7 <= 20{ count[$5]++}
END{
for(k in count)
printf "%s %d\n", k, count[k]
}' | awk '{A[$1]=$2;next} ($1 in A){X=(A[$1]/$3)*100;printf("%s %.1f\n",$1, 100-X)}' > output