Hi everyone,
I need to replace values of column 2 array1 with values of column 2 array2 based on a lookup of column 4 value, but only return a value IF the values in column 1 of BOTH array1 and array2 match, otherwise keep original value in column 2 of array1. Both files are tab delimited text files and I'm using Ubuntu 16.04.
For example, here is array1:
1 ab123 0.02013 1111 T C
1 ab124 0.026621 1112 T C
1 ab125 0.027256 5423 A G
1 ab126 0.028999 1245 G T
1 ab127 0.029335 1246 C A
1 ab128 0.029367 1247 A G
1 ab129 0.029785 1248 G T
2 ab130 0.044493 1111 C A
2 ab131 0.044524 1247 C T
2 ab132 0.044571 7542 C T
2 ab133 0.044629 1451 C A
2 ab134 0.044636 1452 C T
2 ab135 0.044993 1453 C T
Here is array2:
1 rs3094315 0.02013 1111 G A
1 rs6687776 0.026621 1112 T C
1 rs9442380 0.027256 1113 T C
1 rs11260549 0.028999 1245 A G
1 rs2887286 0.029335 1246 C T
1 rs3813199 0.029367 1247 A G
1 rs6685064 0.029785 1248 T C
2 rs4648808 0.044493 1111 T C
2 rs3128291 0.044524 1247 A G
2 rs3128296 0.044571 1450 G T
2 rs424079 0.044629 1451 C A
2 rs2257182 0.044636 1452 C T
2 rs263526 0.044993 1453 C T
Here is the desired output:
1 rs3094315 0.02013 1111 T C
1 rs6687776 0.026621 1112 T C
1 ab125 0.027256 5423 A G
1 rs11260549 0.028999 1245 G T
1 rs2887286 0.029335 1246 C A
1 rs3813199 0.029367 1247 A G
1 rs6685064 0.029785 1248 G T
2 rs4648808 0.044493 1111 C A
2 rs3128291 0.044524 1247 C T
2 ab132 0.044571 7542 C T
2 rs424079 0.044629 1451 C A
2 rs2257182 0.044636 1452 C T
2 rs263526 0.044993 1453 C T
Notice that since there is no equivalent column 4 values for rows 3 and 8, I want to keep the original column 2 values in array1
I used the following code and it works fine, except for it does not condition on values of column 1 of both array1 and array2 matching. I need to condition the lookup on values of column 1 of both array1 and array2 matching:
awk 'NR == FNR {REP[$4] = $2; next} $4 in REP && FNR > 0 {$2 = REP[$4]; CNT++} 1; END {print CNT}' OFS="\t" array2 array1 > output
Thanks in advance