Remove lines from output in files using awk

I have two large files (~250GB) that I am trying to remove the where GT: 0/0 or 1/1 or 2/2 for both files. I was going to use a bash with the below awk , which I think will find each line but how do I remove that line is that condition is found? Thank you :).

Input

20      60055   .       A       .       35      PASS    DP=25;PF=20;MF=5;MQ=60;SB=0.800 GT:AD:DP:GQ:FL  0/0:25:25:99:PASS
20      60056   .       G      A.       35      PASS    DP=25;PF=20;MF=5;MQ=60;SB=0.800 GT:AD:DP:GQ:FL  0/1:12,13:25:99:PASS,PASS
20      60057   .       T       .       35      PASS    DP=26;PF=20;MF=6;MQ=60;SB=0.769 GT:AD:DP:GQ:FL  0/0:26:26:99:PASS
20      60058   .       C      T       35      PASS    DP=25;PF=20;MF=5;MQ=60;SB=0.800 GT:AD:DP:GQ:FL  1/1:25:25:99:PASS
awk '$9~"^[012]"{$0=$0($9~"^(0/0|1/1|2/2)"?" hom
":" het")}1' input

Desired output

20      60056   .       G      A.       35      PASS    DP=25;PF=20;MF=5;MQ=60;SB=0.800 GT:AD:DP:GQ:FL  0/1:12,13:25:99:PASS,PASS
awk '$NF ~ /0\/1/'
1 Like

Your spec is (not for the first time) rather misleading. There's NO field that contains GT: 0/0 or 1/1 or 2/2 . It is left to the reader's interpretation that field 9 is a sort of description for the next field, and field 10 seems to have the respective values. Your unfit code snippet doesn't help either. It doesn't remove any lines, nor will field 9 ever start with 0, 1, or 2.

And, no logic connection between the TWO files is perceivable. You seem to request a solution for ANY file applicable for your two generic files.

Please be aware that a correct, detailed, and carefully taylored specification will save everybody's time including your's!

For your problem, try

awk '$NF !~ /^(0\/0|1\/1|2\/2)/' file
20      60056   .       G      A.       35      PASS    DP=25;PF=20;MF=5;MQ=60;SB=0.800 GT:AD:DP:GQ:FL  0/1:12,13:25:99:PASS,PASS
1 Like

To check for values in field 10 of any subfield identified in field 9, try

awk '
        {for (n=split ($9, TMP, ":"); n>0; n--) TYPE[TMP[n]] = n
         split ($10, VAL, ":")
         if (VAL[TYPE] ~ PAT) next
        }
1
' SUB="GT" PAT="0/0|1/1|2/2" file
20      60056   .       G      A.       35      PASS    DP=25;PF=20;MF=5;MQ=60;SB=0.800 GT:AD:DP:GQ:FL  0/1:12,13:25:99:PASS,PASS

or, for the last subfield "FL", it yields

 SUB="FL" PAT="PASS," file
20      60055   .       A       .       35      PASS    DP=25;PF=20;MF=5;MQ=60;SB=0.800 GT:AD:DP:GQ:FL  0/0:25:25:99:PASS
20      60057   .       T       .       35      PASS    DP=26;PF=20;MF=6;MQ=60;SB=0.769 GT:AD:DP:GQ:FL  0/0:26:26:99:PASS
20      60058   .       C      T       35      PASS    DP=25;PF=20;MF=5;MQ=60;SB=0.800 GT:AD:DP:GQ:FL  1/1:25:25:99:PASS
1 Like

Thank you very much :).