Excel vlookup function like value mapping with awk

I have two files

File1

175552 st_497858.1 rs86052.1 rs92185.1 st_001022416.1 174841 175552_174841
179912 st_001122967.2 rs90435.1 rs89122.1 st_001022583.1 175545 179912_175545
179912 st_001122967.2 rs90435.1 rs89122.1 st_001022584.1 175545 179912_175545
179967 st_001256606.1 rs93516.2 rs86923.1 st_501841.1 177879 179967_177879

File2

179967_177879 0.75
179912_175545 0.5

Desired output File3

175552 st_497858.1 rs86052.1 rs92185.1 st_001022416.1 174841 175552_174841 NA
179912 st_001122967.2 rs90435.1 rs89122.1 st_001022583.1 175545 179912_175545 0.5
179912 st_001122967.2 rs90435.1 rs89122.1 st_001022584.1 175545 179912_175545 0.5
179967 st_001256606.1 rs93516.2 rs86923.1 st_501841.1 177879 179967_177879 0.75

I want to match the 7th column of File1 with 1st column of File2 and map the corresponding values at 8th cloumn. If any id of 7th column is not avalible (such as 175552_174841 here) in File 2 it should print NA. Here is what I did

awk 'NR == FNR{a[$7] = $1;next}; {print $0, $7 in a?a[$1]: "NA"}' File2 File1

The current output

175552 st_497858.1 rs86052.1 rs92185.1 st_001022416.1 174841 175552_174841 NA
179912 st_001122967.2 rs90435.1 rs89122.1 st_001022583.1 175545 179912_175545 NA
179912 st_001122967.2 rs90435.1 rs89122.1 st_001022584.1 175545 179912_175545 NA
179967 st_001256606.1 rs93516.2 rs86923.1 st_501841.1 177879 179967_177879 NA

Could someone please help to correct the code and explain it a bit. Thanks

You're almost there.
Just keep in mind that awk splits each line of file1 into 7 tokens ($1 through $7) and file2 into 2 ($1 and $2).

$
$ awk 'NR == FNR{a[$1] = $2;next}; {print $0, $7 in a?a[$7]: "NA"}' file2 file1
175552 st_497858.1 rs86052.1 rs92185.1 st_001022416.1 174841 175552_174841 NA
179912 st_001122967.2 rs90435.1 rs89122.1 st_001022583.1 175545 179912_175545 0.5
179912 st_001122967.2 rs90435.1 rs89122.1 st_001022584.1 175545 179912_175545 0.5
179967 st_001256606.1 rs93516.2 rs86923.1 st_501841.1 177879 179967_177879 0.75
$
$
1 Like