The tab-delimited
file below using the awk
produces a blank output. However, when I copy the same lines in file to a new document and execute the awk
I get the desired result.
The awk
counts the unique characters before the :
in $7
according to the id in $1
.
The awk
seems to work but I can not figure out why it doesn't on the original file. There doesn't appear to be windows line endings. Thank you :).
file
PTPN11 5781 13324 28363 genomic na LRG_614:g.36663G>T g.36663G>T - - Yes No No
PTPN11 5781 13324 28363 coding na LRG_614t1:c.214G>T c.214G>T LRG_614p1:p.Ala72Ser p.Ala72Ser Yes No No
PTPN11 5781 13324 28363 coding na NM_002834.4:c.214G>T c.214G>T NP_002825.3:p.Ala72Ser p.Ala72Ser Yes No Yes
PTPN11 5781 13324 28363 coding na NM_080601.2:c.214G>T c.214G>T NP_542168.1:p.Ala72Ser p.Ala72Ser No No No
awk
BEGIN { FS="[\t:]" }
{
cnt[$1][$7]++
max[$1] = (max[$1] > cnt[$1][$7] ? max[$1] : cnt[$1][$7])
}
END {
for (word in cnt) {
for (val in cnt[word]) {
if (cnt[word][val] == max[word]) {
print word, val
}
}
}
}
desired result
PTPN11 LRG_614t1
PTPN11 LRG_614
PTPN11 NM_080601.2
PTPN11 NM_002834.4