awk to print line(s) meeting condions

In the awk below which executes, but the out is empty, I am trying to print the line that meets both conditions below:

1. $7 = PASS
2.AF= > .03 or 3%
3. function = nonsense or frameshift

In the file below in the second line 'function' is not present in the line and AF= < 3% .

I am close I think, but not able to get the script to work as desired. Thank you :).

file tab-delimited

chr10 89624278 . G T 62.8836 PASS AF=0.0785393;AO=297;DP=4155;FAO=157;FDP=1999;FR=.;FRO=1842;FSAF=77;FSAR=80;FSRF=908;FSRR=934;FWDB=0.0113997;FXX=4.99998E-4;HRUN=1;LEN=1;MLLD=117.237;OALT=T;OID=.;OMAPALT=T;OPOS=89624278;OREF=G;PB=.;PBP=.;QD=0.12583;RBI=0.040843;REFB=5.39678E-4;REVB=-0.0392199;RO=3844;SAF=150;SAR=147;SRF=1936;SRR=1908;SSEN=0;SSEP=0;SSSB=0.00159791;STB=0.502301;STBP=0.96;TYPE=snp;VARB=-0.00676678;FUNC=[{'origPos':'89624278','origRef':'G','normalizedRef':'G','gene':'PTEN','normalizedPos':'89624278','normalizedAlt':'T','gt':'pos','codon':'TAG','coding':'c.52G>T','transcript':'NM_000314.4','function':'nonsense','protein':'p.Glu18Ter','location':'exonic','origAlt':'T','exon':'1'}] GT:GQ:DP:FDP:RO:FRO:AO:FAO:AF:SAR:SAF:SRF:SRR:FSAR:FSAF:FSRF:FSRR:QT 0/1:62:4155:1999:3844:1842:297:157:0.0785393:147:150:1936:1908:80:77:908:934:1
chr10 89624293 COSM86051 T G 341.74 PASS AF=0;AO=1;DP=4145;FAO=0;FDP=1995;FR=.;FRO=1995;FSAF=0;FSAR=0;FSRF=1008;FSRR=987;FWDB=0.0136548;FXX=0.00249999;HRUN=2;LEN=1;MLLD=151.799;OALT=G,G;OID=COSM86051,COSM86051;OMAPALT=G,G;OPOS=89624293,89624293;OREF=T,T;PB=.;PBP=.;QD=0.685192;RBI=0.0172428;REFB=4.38663E-6;REVB=0.010529;RO=4136;SAF=0;SAR=1;SRF=2077;SRR=2059;SSEN=0;SSEP=0;SSSB=-0.00528832;STB=0.5;STBP=1;TYPE=snp;VARB=0;HS;FUNC=[{'transcript':'NM_000314.4','gene':'PTEN','location':'exonic','exon':'1'}] GT:GQ:DP:FDP:RO:FRO:AO:FAO:AF:SAR:SAF:SRF:SRR:FSAR:FSAF:FSRF:FSRR:QT 0/0:341:4145:1995:4136:1995:1:0:0:1:0:2077:2059:0:0:1008:987:0

awk

awk -F'\t' '{$7=="PASS" &&
                        if(/AF=[^;]*/+0 > .03 && "'function'" == "nonsense" || "'function'" == "frameshift"){
                                       print
      }
     }
' file > out

desired out tab-delimited --- only line that meets both conditions

chr10 89624278 . G T 62.8836 PASS AF=0.0785393;AO=297;DP=4155;FAO=157;FDP=1999;FR=.;FRO=1842;FSAF=77;FSAR=80;FSRF=908;FSRR=934;FWDB=0.0113997;FXX=4.99998E-4;HRUN=1;LEN=1;MLLD=117.237;OALT=T;OID=.;OMAPALT=T;OPOS=89624278;OREF=G;PB=.;PBP=.;QD=0.12583;RBI=0.040843;REFB=5.39678E-4;REVB=-0.0392199;RO=3844;SAF=150;SAR=147;SRF=1936;SRR=1908;SSEN=0;SSEP=0;SSSB=0.00159791;STB=0.502301;STBP=0.96;TYPE=snp;VARB=-0.00676678;FUNC=[{'origPos':'89624278','origRef':'G','normalizedRef':'G','gene':'PTEN','normalizedPos':'89624278','normalizedAlt':'T','gt':'pos','codon':'TAG','coding':'c.52G>T','transcript':'NM_000314.4','function':'nonsense','protein':'p.Glu18Ter','location':'exonic','origAlt':'T','exon':'1'}] GT:GQ:DP:FDP:RO:FRO:AO:FAO:AF:SAR:SAF:SRF:SRR:FSAR:FSAF:FSRF:FSRR:QT 0/1:62:4155:1999:3844:1842:297:157:0.0785393:147:150:1936:1908:80:77:908:934:1

Hello cmccabe,

If your Input_file is same as shown sample Input_file then following may help you.

awk -F'[ ;=,:]' '$7=="PASS" && $9>.03 && $107 ~ /function/ && $108 ~ /nonsense/'   Input_file

In case your Input_file is NOT same as shown sample Input_file which means it's fields are NOT fixed then you need to traverse through the fields and check for these values like PASS , function etc etc and store their field number's value and then check your all conditions by using their field's values. I hope this helps.

Thanks,
R. Singh

1 Like

Hi,
You can try this:

awk -F'\t' '$7 == "PASS" && /AF=([1-9]|[0-9]\.[1-9]|0\.0[3-9])/ && /'"'"'function'"'"':'"'"'(nonsense|frameshift)'"'"'/' file

Regards.

1 Like

Thank you both very much :).