I am trying to modify and understand an awk
written by @Scrutinizer
The below awk will filter a list of 30,000 lines in the tab-delimited
file. What I am having trouble with is adding a condition to SVTYPE=CNV
that will only print that line if CI=,0.95:
portion in blue in file is <1.9
.
The other condition works perfectly and I added comments as to what I think is happening in each step. Thank you :).
file
chr16 68771250 CDH1 G <CNV> 100.0 PASS HS;FR=.;PRECISE=FALSE;SVTYPE=CNV;END=68867430;LEN=96180;NUMTILES=39;SD=0.47;CDF_MAPD=0.01:0.962265,0.025:0.985543,0.05:1.006017,0.1:1.030158,0.2:1.060175,0.25:1.071807,0.5:1.12,0.75:1.17036,0.8:1.183201,0.9:1.217678,0.95:1.246897,0.975:1.272801,0.99:1.303591;REF_CN=2;CI=0.05:0.895574,0.95:1.16322;RAW_CN=1.12;FUNC=[{'gene':'CDH1'}] GT:GQ:CN ./.:0:1.02
chr15 90631824 IDH2 G <CNV> 100.0 PASS FR=.;PRECISE=FALSE;SVTYPE=CNV;END=90631954;LEN=130;NUMTILES=1;SD;CDF_MAPD=0.01:0.647181,0.025:0.751369,0.05:0.85432,0.1:0.99068,0.2:1.185313,0.25:1.268903,0.5:1.67,0.75:2.197882,0.8:2.352881,0.9:2.815138,0.95:3.264469,0.975:3.711758,0.99:4.309304;REF_CN=2;CI=0.05:0.727022,0.95:3.40497;RAW_CN=1.67;FUNC=[{'gene':'IDH2'}] GT:GQ:CN ./.:0:1.63
awk
awk -F'[\t;]' ' # define FS as tab and ;
{
split(x,V)
for(i=1; i<=NF; i++) { # create loop i (which is each line) and iterate though
split($i,F,/=/) # each portion (in green) of the line with the pattern = read into array F splitting using FS
V[F[1]]=F[2] # set each split in array F equal to array V (defined below)
}
}
(V["SVTYPE"]=="CNV" && V["CI"]+0 < 1.9) || # define V for CNV - not sure if the entire CI is being used or maybe splitting on the , would work better
(V["SVTYPE"]=="Fusion" && V["READ_COUNT"]+0 > 10) # define V for Fusion
' file > out
desired output - only this line has a CI=0.95
value < 1.9
chr16 68771250 CDH1 G <CNV> 100.0 PASS HS;FR=.;PRECISE=FALSE;SVTYPE=CNV;END=68867430;LEN=96180;NUMTILES=39;SD=0.47;CDF_MAPD=0.01:0.962265,0.025:0.985543,0.05:1.006017,0.1:1.030158,0.2:1.060175,0.25:1.071807,0.5:1.12,0.75:1.17036,0.8:1.183201,0.9:1.217678,0.95:1.246897,0.975:1.272801,0.99:1.303591;REF_CN=2;CI=0.05:0.895574,0.95:1.16322;RAW_CN=1.12;FUNC=[{'gene':'CDH1'}] GT:GQ:CN ./.:0:1.02