Conditional Vlookup

Hi everyone,

I need to replace values of column 2 array1 with values of column 2 array2 based on a lookup of column 4 value, but only return a value IF the values in column 1 of BOTH array1 and array2 match, otherwise keep original value in column 2 of array1. Both files are tab delimited text files and I'm using Ubuntu 16.04.

For example, here is array1:

1    ab123    0.02013    1111    T    C
1    ab124    0.026621    1112    T    C
1    ab125    0.027256    5423    A    G
1    ab126    0.028999    1245    G    T
1    ab127    0.029335    1246    C    A
1    ab128    0.029367    1247    A    G
1    ab129    0.029785    1248    G    T
2    ab130    0.044493    1111    C    A
2    ab131    0.044524    1247    C    T
2    ab132    0.044571    7542    C    T
2    ab133    0.044629    1451    C    A
2    ab134    0.044636    1452    C    T
2    ab135    0.044993    1453    C    T

Here is array2:

1    rs3094315    0.02013    1111    G    A
1    rs6687776    0.026621    1112    T    C
1    rs9442380    0.027256    1113    T    C
1    rs11260549    0.028999    1245    A    G
1    rs2887286    0.029335    1246    C    T
1    rs3813199    0.029367    1247    A    G
1    rs6685064    0.029785    1248    T    C
2    rs4648808    0.044493    1111    T    C
2    rs3128291    0.044524    1247    A    G
2    rs3128296    0.044571    1450    G    T
2    rs424079    0.044629    1451    C    A
2    rs2257182    0.044636    1452    C    T
2    rs263526    0.044993    1453    C    T

Here is the desired output:

1    rs3094315    0.02013    1111    T    C
1    rs6687776    0.026621    1112    T    C
1    ab125    0.027256    5423    A    G
1    rs11260549    0.028999    1245    G    T
1    rs2887286    0.029335    1246    C    A
1    rs3813199    0.029367    1247    A    G
1    rs6685064    0.029785    1248    G    T
2    rs4648808    0.044493    1111    C    A
2    rs3128291    0.044524    1247    C    T
2    ab132    0.044571    7542    C    T
2    rs424079    0.044629    1451    C    A
2    rs2257182    0.044636    1452    C    T
2    rs263526    0.044993    1453    C    T

Notice that since there is no equivalent column 4 values for rows 3 and 8, I want to keep the original column 2 values in array1

I used the following code and it works fine, except for it does not condition on values of column 1 of both array1 and array2 matching. I need to condition the lookup on values of column 1 of both array1 and array2 matching:

awk 'NR == FNR {REP[$4] = $2; next} $4 in REP && FNR > 0 {$2 = REP[$4]; CNT++} 1; END {print CNT}' OFS="\t" array2 array1  > output

Thanks in advance

Would

awk 'NR == FNR {REP[$1,$4] = $2; next} ($1,$4) in REP  {$2 = REP[$1,$4]; CNT++} 1; END {print CNT}' OFS="\t" file2 file1

do what you need?

1 Like

Awesome Rudy, you're a genius :slight_smile: