awk to print out with two possibilities

The below awk adds a header Variants Detected: followed by, in this case the line in
file underneath the ## (like the desired result). The script does execute and works as expected
(looks for the keywords in file and prints if found underneath header) if I remove the last awk , which
prints the message if the output is blank. With the last awk the blank output always prints.
I can not seem to script the awk correctly to account for both possibilities. There are typically multiple lines
in the file if it is not black/empty. Thank you :).

file

##fileformat=VCFv4.1
##fileDate=20170422
chr17    7577108    COSM10749;COSM43737    C    A,T    149.594    PASS    AF=0.0830415,0.0;AO=372,2;DP=4420;FAO=166,0;FDP=1999;FR=.,.,REALIGNEDx0.0865;FRO=1833;FSAF=82,0;FSAR=84,0;FSRF=952;FSRR=881;FWDB=0.0072184,-0.0207142;FXX=4.99998E-4;HRUN=1,1;LEN=1,1;MLLD=293.795,80.5366;OALT=A,T;OID=COSM10749,COSM43737;OMAPALT=A,T;OPOS=7577108,7577108;OREF=C,C;PB=.,.;PBP=.,.;QD=0.299338;RBI=0.00721997,0.02565;REFB=1.40155E-4,-7.81395E-4;REVB=1.50579E-4,0.0151276;RO=4043;SAF=187,1;SAR=185,1;SRF=2118;SRR=1925;SSEN=0,0;SSEP=0,0;SSSB=-0.0251826,-5.12306E-4;STB=0.52327,0.5;STBP=0.541,1.0;TYPE=snp,snp;VARB=-0.00153404,0.0;HS;FUNC=[{'origPos':'7577108','origRef':'C','normalizedRef':'C','gene':'TP53','normalizedPos':'7577108','normalizedAlt':'A','polyphen':'1.0','gt':'pos','codon':'TTT','coding':'c.830G>T','sift':'0.0','grantham':'205.0','transcript':'NM_000546.5','function':'missense','protein':'p.Cys277Phe','location':'exonic','origAlt':'A','exon':'8','oncomineGeneClass':'Loss-of-Function','oncomineVariantClass':'Hotspot'}]    GT:GQ:DP:FDP:RO:FRO:AO:FAO:AF:SAR:SAF:SRF:SRR:FSAR:FSAF:FSRF:FSRR:QT    0/1:149:4420:1999:4043:1833:372,2:166,0:0.0830415,0.0:185,1:187,1:2118:1925:84,0:82,0:952:881:1

awk

printf "Variants Detected: \n" >> out | awk NR>2 -v p1="PASS" -v p2="'oncomineGeneClass'" -v p3="'oncomineVariantClass':" '$0 ~ p1 && $0 ~ p2 && $0 ~ p3' file | awk 'BEGIN{if(p1=="" && p2=="" && p3==""){print "nothing detected"}}'>> out

desired output

Variants Detected:
chr17    7577108    COSM10749;COSM43737    C    A,T    149.594    PASS    AF=0.0830415,0.0;AO=372,2;DP=4420;FAO=166,0;FDP=1999;FR=.,.,REALIGNEDx0.0865;FRO=1833;FSAF=82,0;FSAR=84,0;FSRF=952;FSRR=881;FWDB=0.0072184,-0.0207142;FXX=4.99998E-4;HRUN=1,1;LEN=1,1;MLLD=293.795,80.5366;OALT=A,T;OID=COSM10749,COSM43737;OMAPALT=A,T;OPOS=7577108,7577108;OREF=C,C;PB=.,.;PBP=.,.;QD=0.299338;RBI=0.00721997,0.02565;REFB=1.40155E-4,-7.81395E-4;REVB=1.50579E-4,0.0151276;RO=4043;SAF=187,1;SAR=185,1;SRF=2118;SRR=1925;SSEN=0,0;SSEP=0,0;SSSB=-0.0251826,-5.12306E-4;STB=0.52327,0.5;STBP=0.541,1.0;TYPE=snp,snp;VARB=-0.00153404,0.0;HS;FUNC=[{'origPos':'7577108','origRef':'C','normalizedRef':'C','gene':'TP53','normalizedPos':'7577108','normalizedAlt':'A','polyphen':'1.0','gt':'pos','codon':'TTT','coding':'c.830G>T','sift':'0.0','grantham':'205.0','transcript':'NM_000546.5','function':'missense','protein':'p.Cys277Phe','location':'exonic','origAlt':'A','exon':'8','oncomineGeneClass':'Loss-of-Function','oncomineVariantClass':'Hotspot'}]    GT:GQ:DP:FDP:RO:FRO:AO:FAO:AF:SAR:SAF:SRF:SRR:FSAR:FSAF:FSRF:FSRR:QT    0/1:149:4420:1999:4043:1833:372,2:166,0:0.0830415,0.0:185,1:187,1:2118:1925:84,0:82,0:952:881:1

desired output if blank

Variants Detected:
nothing detected

Try:

awk -v p1="PASS" -v p2="'oncomineGeneClass'" -v p3="'oncomineVariantClass':" '
  !/^#/ { 
    printf "Variants Detected: \n%s\n",($0 ~ p1 && $0 ~ p2 && $0 ~ p3) ? $0 : "nothing detected"
  }
' file > file.out
1 Like

This one prints the "Variants Detected:" only once, before the first match. And at the END it prints "nothing detected" if there was not a single match. It uses a control variable wasfound .

awk -v p1="PASS" -v p2="'oncomineGeneClass'" -v p3="'oncomineVariantClass':" '
  !/^#/ && $0 ~ p1 && $0 ~ p2 && $0 ~ p3 {
    if (!wasfound) {
      print "Variants Detected:"
      wasfound=1
    }
    print
  }
  END {
    if (!wasfound) { print "nothing detected" }
  }
' file 
1 Like

Thank you both :slight_smile: