...
...
##bcftools_normVersion=1.3.1+htslib-1.3.1
##bcftools_normCommand=norm -m-both -o genome_split.vcf genome.vcf.gz
##bcftools_normCommand=norm -f /home/cmccabe/Desktop/NGS/picard-tools-1.140/resources/ucsc.hg19.fasta -o genome_annovar.vcf genome_split.vcf
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT
chr1 948797 . C . 0 PASS DP=159;END=948845;MAX_DP=224;MIN_DP=95 GT:DP:MIN_DP:MAX_DP 0/0:159:95:224
desired output
##bcftools_normVersion=1.3.1+htslib-1.3.1
##bcftools_normCommand=norm -m-both -o genome_split.vcf genome.vcf.gz
##bcftools_normCommand=norm -f /home/cmccabe/Desktop/NGS/picard-tools-1.140/resources/ucsc.hg19.fasta -o genome_annovar.vcf genome_split.vcf
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT
chr1 948797 . C . 0 PASS DP=159;END=948845;MAX_DP=224;MIN_DP=95 GT:DP:MIN_DP:MAX_DP 0/0:159:95:224 . . . . . 159 homref .
other format of lines to skip
chr1 948846 . T TA 1185.38 PASS AF=0.986111;AO=200;DP=227;FAO=213;FDP=216;FDVR=0;FR=.;FRO=3;FSAF=132;FSAR=81;FSRF=3;FSRR=0;FWDB=-0.038073;FXX=0.0226234;HRUN=1;LEN=1;MLLD=24.748;OALT=A;OID=.;OMAPALT=TA;OPOS=948847;OREF=-;PB=.;PBP=.;QD=21.9515;RBI=0.0905281;REFB=0.0706997;REVB=0.0821327;RO=17;SAF=122;SAR=78;SRF=15;SRR=2;SSEN=0;SSEP=0;SSSB=-0.0430195;STB=0.505617;STBP=0.201;TYPE=ins;VARB=0.000148797 GT:GQ:DP:FDP:RO:FRO:AO:FAO:AF:SAR:SAF:SRF:SRR:FSAR:FSAF:FSRF:FSRR 1/1:219:227:216:17:3:200:213:0.986111:78:122:15:2:81:132:3:0 0.986 132 81 1 GOOD 216 homalt 36
If I split on $8
using DP=
and print the value up to the ;
, since that tag is iin both lines, wont it print for both? However since $16
has a value in it that line can be skipped (the much larger of the two lines).
awk
awk -F'\t' -v OFS="\t" '{if (!$16);{split($8,a,[=;]); {print $1, $2, $3, $4, $5, $6, $7, $8, $9, $10, ".", ".", ".", ".", ".", a[1], "homref", "."}' file
Not sure is the above is any closer and I did not really know what would print, but understand better know? Thank you :).
I also tried the perl below which executes but does not print any new output the desired output, but seems close.
perl
perl -plae '
BEGIN{ %h = qw(0/0 homref) } /^[^#].*DP=(\d+);.*([0-0]\/[0-0])/ and $_ .= join "\t", ("", "."),".",".",".", ".", $1, $h{$7},"."' file > out
output of line to change --- seems to extract 224 and doesn't print homref though qw
=0/0)
chr1 948797 . C . 0 PASS DP=159;END=948845;MAX_DP=224;MIN_DP=95 GT:DP:MIN_DP:MAX_DP 0/0:159:95:224 . . . . . 224 .
chr1 948846 . T TA 1185.38 PASS AF=0.986111;AO=200;DP=227;FAO=213;FDP=216;FDVR=0;FR=.;FRO=3;FSAF=132;FSAR=81;FSRF=3;FSRR=0;FWDB=-0.038073;FXX=0.0226234;HRUN=1;LEN=1;MLLD=24.748;OALT=A;OID=.;OMAPALT=TA;OPOS=948847;OREF=-;PB=.;PBP=.;QD=21.9515;RBI=0.0905281;REFB=0.0706997;REVB=0.0821327;RO=17;SAF=122;SAR=78;SRF=15;SRR=2;SSEN=0;SSEP=0;SSSB=-0.0430195;STB=0.505617;STBP=0.201;TYPE=ins;VARB=0.000148797 GT:GQ:DP:FDP:RO:FRO:AO:FAO:AF:SAR:SAF:SRF:SRR:FSAR:FSAF:FSRF:FSRR 1/1:219:227:216:17:3:200:213:0.986111:78:122:15:2:81:132:3:0 0.986 132 81 1 GOOD 216 homalt 36 --- this line is correct ----