awk to print lines that meet conditions and have value in another file

I am trying to use awk to print lines that satisfy either of the two conditions below:

condition 1: $2 equals CNV and the split of $3 , the value in red, is greater than or equal to 4. ---- this is a[1] or so I think
condition 2: $2 equals CNV and the split of $3 , the value in red --- this is a[1] or so I think, is less than or equal to 1.0 and the value in green --- this is a[3] or so I thnk in less than or equal to 1.9 and $4 matches a line in list . I have added comments to the code as to what I think is happening. The code execcutes but all the CNV lines are printed currently. Thank you :).

file

chr1:11184539	CNV	5%:5.5,95%:2.68	Name
chr1:11184539	REF
chr1:11184539	SNV		A
chr1:11184555	CNV	5%:0.9,95%:1.9	BRCA1
chr1:11184539	FUSION
chr1:11184539	INDEL	G
chr1:11184555	CNV	5%:2.5,95%:2.68	Name2
chr1:11184555	CNV	5%:1.1,95%:1.8	BRCA2

list

BRCA1
BRCA2

awk

awk -F'\t' '{split($3,a,":,")} $2=="CNV" && a[1]>=4.0' file    # capture condition 1  --- spilt $3 on : and , and check if $2 is CNV, compare a[1] >=4.0 ----
awk -F'\t' '{split($3,a,":,")} $2=="CNV" && a[1]<=1.0 && a[3]<=1.9 && NR==FNR{c[$1]++;next};c[$1] > 0' file list  # capture condition 2 --- spilt $3 on : and , and check if $2 is CNV, compare a[1] ,=1.0 and a[3] <=1.9 and $4 is matches $1 in list ----

desired output

chr1:11184539	CNV	5%:5.5,95%:2.68	Name
chr1:11184555	CNV	5%:0.9,95%:1.9	BRCA1
awk '
NR==FNR {a[$1]=$1; next;}
$2=="CNV" {
   c=split($3, b, "[,:]");
   if (b[2]>=4.0 || (b[2]<=1.0 && b[c]<=1.9 && length(a[$NF]))) print $0;
}
' list FS="\t" infile
1 Like

Thank you very much for your help :).

You do not need to store the key here; saves some memory:

awk '
NR==FNR { a[$1]; next }
$2=="CNV" {
  c=split($3, b, "[,:]")
  if (b[2]>=4.0 || (b[2]<=1.0 && b[c]<=1.9 && ($NF in a))) print
}
' list FS="\t" file
1 Like

Thank you very much :).