awk to print field from lookup file in output

The below awk uses $3 and $4 in search as the min and max , then takes each $2 value in lookup and compares it. If the value in lookup falls within the range in search then it prints the entire line in lookup/ICODE]. What I can't seem to figure out is how to print the matching $5 from search on that line. Thank you :).

awk

awk '                   
    NR == FNR {min[$1]=$3; max[$1]=$4; next}
    {                
        for (id in min) 
            if (min[id] < $2 && $2 < max[id]) {
                print $0, id
                break              
            }
    }                                     
' search lookup > out

lookup

##INFO=<ID=MA,Number=1,Type=String,Description="Minor Allele">
##INFO=<ID=MAF,Number=1,Type=Float,Description="Minor Allele Frequency">
##INFO=<ID=MAC,Number=1,Type=Integer,Description="Minor Alelele Count">
##INFO=<ID=AA,Number=1,Type=String,Description="Ancestral Allele">
#CHROM    POS    ID    REF    ALT    QUAL    FILTER    INFO
1    1014143    rs786201005    C    T    .    .    dbSNP_147;TSA=SNV;E_Phenotype_or_Disease;CLIN_pathogenic;AA=C
7    48311388    rs672601345    G    GG    .    .    dbSNP_147;TSA=insertion;E_Phenotype_or_Disease;CLIN_pathogenic
1    1232600    rs786200943    GCCCTGGAGCGGGAGCAGGCGCG    G    .    .    dbSNP_147;TSA=deletion;E_Phenotype_or_Disease;CLIN_pathogenic
1    1232630    rs750088530    GA    G    .    .    dbSNP_147;TSA=deletion;E_Phenotype_or_Disease;E_ExAC;CLIN_pathogenic;AA=A
1    1232692    rs786200942    CATGCTGGCC    C    .    .    dbSNP_147;TSA=deletion;E_Phenotype_or_Disease;CLIN_pathogenic
22    94487198    rs672601312    G    T    .    .    dbSNP_147;TSA=SNV;E_Phenotype_or_Disease;E_ExAC;CLIN_pathogenic;AA=G

search

R_Index    Chr    Start    End    Gene
20    chr7    48684202    48684205    ABCA13
21    chr7    48311373    48311704    ABCA13;AX746840
22    chr22    94487185    94487248    ABCA4

desired output

7    48311388    rs672601345    G    GG    .    .    dbSNP_147;TSA=insertion;E_Phenotype_or_Disease;CLIN_pathogenic 21     ABCA13;AX746840
22    94487198    rs672601312    G    T    .    .    dbSNP_147;TSA=SNV;E_Phenotype_or_Disease;E_ExAC;CLIN_pathogenic;AA=G 22     ABCA4

current output

7    48311388    rs672601345    G    GG    .    .    dbSNP_147;TSA=insertion;E_Phenotype_or_Disease;CLIN_pathogenic 21
22    94487198    rs672601312    G    T    .    .    dbSNP_147;TSA=SNV;E_Phenotype_or_Disease;E_ExAC;CLIN_pathogenic;AA=G 22
[akshay@localhost tmp]$ awk '                   
    NR == FNR {min[$1]=$3; max[$1]=$4; o[$1]=$5; next}
    {                
        for (id in min) 
            if (min[id] < $2 && $2 < max[id]) {
                print $0, id, o[id]
                break              
            }
    }                                     
'  search lookup
7    48311388    rs672601345    G    GG    .    .    dbSNP_147;TSA=insertion;E_Phenotype_or_Disease;CLIN_pathogenic 21 ABCA13;AX746840
22    94487198    rs672601312    G    T    .    .    dbSNP_147;TSA=SNV;E_Phenotype_or_Disease;E_ExAC;CLIN_pathogenic;AA=G 22 ABCA4

OR

[akshay@localhost tmp]$ awk '                   
    NR == FNR {i=$1 OFS $5; min[i]=$3; max[i]=$4; next}
    {                
        for (id in min) 
            if (min[id] < $2 && $2 < max[id]) {
                print $0, id       
                break              
            }
    }                                     
'  search lookup
7    48311388    rs672601345    G    GG    .    .    dbSNP_147;TSA=insertion;E_Phenotype_or_Disease;CLIN_pathogenic 21 ABCA13;AX746840
22    94487198    rs672601312    G    T    .    .    dbSNP_147;TSA=SNV;E_Phenotype_or_Disease;E_ExAC;CLIN_pathogenic;AA=G 22 ABCA4
1 Like

You are almost done with it.

awk '                   
    NR == FNR {min[$1]=$3; max[$1]=$4; Gene[$1]=$NF; next}
    {                
        for (id in min) 
            if (min[id] < $2 && $2 < max[id]) {
                print $0, id, Gene[id]
                break              
            }
    }                                     
' search lookup > out

Cheers,
Ranga

1 Like

Hello cmccabe,

Following may help you in same.

awk 'FNR==NR && $1 ~ /[[:digit:]]/{;A[$2]=$0;B[++i]=$2;next} {for(j=1;j<=i;j++){if(B[j]>$3 && B[j]<$4){print A[B[j]] FS $NF;next}}}' lookup search

Output will be as follows.

7    48311388    rs672601345    G    GG    .    .    dbSNP_147;TSA=insertion;E_Phenotype_or_Disease;CLIN_pathogenic ABCA13;AX746840
22    94487198    rs672601312    G    T    .    .    dbSNP_147;TSA=SNV;E_Phenotype_or_Disease;E_ExAC;CLIN_pathogenic;AA=G ABCA4

Thanks,
R. Singh

1 Like

Thank you all :slight_smile: