The below awk
will filter a list of 30,000
lines in the tab-delimited file
. What I am having trouble with is adding a condition to SVTYPE=CNV
that will only print that line if CI=
must be >.05
.
The other condition to add is if SVTYPE=Fusion
, then in order to print that line
READ_COUNT
must be > 10
. Thank you :).
file
chr1 11184539 MTOR A <CNV> 100.0 PASS FR=.;PRECISE=FALSE;SVTYPE=CNV;END=11217311;LEN=32772;NUMTILES=4;SD=0.18;CDF_MAPD=0.01:1.373797,0.025:1.472018,0.05:1.562112,0.1:1.67288,0.2:1.817619,0.25:1.875834,0.5:2.13,0.75:2.418604,0.8:2.496068,0.9:2.71203,0.95:2.904337,0.975:3.082096,0.99:3.302454;REF_CN=2;CI=0.05:1.56211,0.95:2.90434;RAW_CN=2.13;FUNC=[{'gene':'MTOR'}] GT:GQ:CN ./.:0:2.13
chr1 11810242 AGTRAP-BRAF.A5B8.COSF828.1_1 G G]chr7:140494267] . FAIL SVTYPE=Fusion;READ_COUNT=0;GENE_NAME=AGTRAP;EXON_NUM=5;RPM=0.0000;NORM_COUNT=0.0;ANNOTATION=COSF828;FAIL_REASON=READ_COUNT<=40|NORM_COUNT<=0.0;FUNC=[{'gene':'AGTRAP','exon':'5'}] GT:GQ ./.:.
chr7:140494267] . PASS SVTYPE=Fusion;READ_COUNT=16;GENE_NAME=AGTRAP;EXON_NUM=5;RPM=0.0000;NORM_COUNT=0.0;ANNOTATION=COSF828;FAIL_REASON=|NORM_COUNT<=0.0;FUNC=[{'gene':'AGTRAP','exon':'5'}] GT:GQ ./.:.
desired output
chr7:140494267] . PASS SVTYPE=Fusion;READ_COUNT=16;GENE_NAME=AGTRAP;EXON_NUM=5;RPM=0.0000;NORM_COUNT=0.0;ANNOTATION=COSF828;FAIL_REASON=|NORM_COUNT<=0.0;FUNC=[{'gene':'AGTRAP','exon':'5'}] GT:GQ ./.:.
awk
awk -F'\t' -v OFS='\t\ '/SVTYPE=/{print}' file