The below awk
uses $3
and $4
in search
as the min
and max
, then takes each $2
value in lookup
and compares it. If the value in lookup
falls within the range in search
then it prints the entire line in lookup/ICODE]. What I can't seem to figure out is how to print the matching $5
from search
on that line. Thank you :).
awk
awk '
NR == FNR {min[$1]=$3; max[$1]=$4; next}
{
for (id in min)
if (min[id] < $2 && $2 < max[id]) {
print $0, id
break
}
}
' search lookup > out
lookup
##INFO=<ID=MA,Number=1,Type=String,Description="Minor Allele">
##INFO=<ID=MAF,Number=1,Type=Float,Description="Minor Allele Frequency">
##INFO=<ID=MAC,Number=1,Type=Integer,Description="Minor Alelele Count">
##INFO=<ID=AA,Number=1,Type=String,Description="Ancestral Allele">
#CHROM POS ID REF ALT QUAL FILTER INFO
1 1014143 rs786201005 C T . . dbSNP_147;TSA=SNV;E_Phenotype_or_Disease;CLIN_pathogenic;AA=C
7 48311388 rs672601345 G GG . . dbSNP_147;TSA=insertion;E_Phenotype_or_Disease;CLIN_pathogenic
1 1232600 rs786200943 GCCCTGGAGCGGGAGCAGGCGCG G . . dbSNP_147;TSA=deletion;E_Phenotype_or_Disease;CLIN_pathogenic
1 1232630 rs750088530 GA G . . dbSNP_147;TSA=deletion;E_Phenotype_or_Disease;E_ExAC;CLIN_pathogenic;AA=A
1 1232692 rs786200942 CATGCTGGCC C . . dbSNP_147;TSA=deletion;E_Phenotype_or_Disease;CLIN_pathogenic
22 94487198 rs672601312 G T . . dbSNP_147;TSA=SNV;E_Phenotype_or_Disease;E_ExAC;CLIN_pathogenic;AA=G
search
R_Index Chr Start End Gene
20 chr7 48684202 48684205 ABCA13
21 chr7 48311373 48311704 ABCA13;AX746840
22 chr22 94487185 94487248 ABCA4
desired output
7 48311388 rs672601345 G GG . . dbSNP_147;TSA=insertion;E_Phenotype_or_Disease;CLIN_pathogenic 21 ABCA13;AX746840
22 94487198 rs672601312 G T . . dbSNP_147;TSA=SNV;E_Phenotype_or_Disease;E_ExAC;CLIN_pathogenic;AA=G 22 ABCA4
current output
7 48311388 rs672601345 G GG . . dbSNP_147;TSA=insertion;E_Phenotype_or_Disease;CLIN_pathogenic 21
22 94487198 rs672601312 G T . . dbSNP_147;TSA=SNV;E_Phenotype_or_Disease;E_ExAC;CLIN_pathogenic;AA=G 22