Mapping the values of ids of two columns of file1 from file2

I have of two space separated files:

==> File1 <==

PT|np_496075.1 st|K92748.1 st|K89648.1 PT|np_001300561.1
PT|np_497284.1 st|K90752.1 st|K90279.1 PT|np_740775.1
PT|np_497749.1 st|K90752.1 st|K92038.1 PT|np_490856.1
PT|np_497284.1 st|K90752.1 st|K88095.1 PT|np_494764.1

==> File 2 <==

PT|np_490856.1 171608
PT|np_740775.1 171647
PT|np_497749.1 171647
PT|np_494764.1 171646
PT|np_497284.1 171648
PT|np_496075.1 171200
PT|np_001300561.1 171202

I want to map the values for first and fourth column of File1 (begin with PT|np_xxxx) from column 2 of File2. The File2 contains the values of PT|np_xxxx.

The desired output is (mapped values of column1 from file2, column1 file1, column2 file1, column3 file1, column4 file1, mapped values of column4 from file2)

==> File 3 <==

171200  PT|np_496075.1    st|K92748.1   st|K89648.1   PT|np_001300561.1 171202
171648  PT|np_497284.1    st|K90752.1   st|K90279.1   PT|np_740775.1    171647
171647  PT|np_497749.1    st|K90752.1   st|K92038.1   PT|np_490856.1    171608
171648  PT|np_497284.1    st|K90752.1   st|K88095.1   PT|np_494764.1    171646
171648  PT|np_497284.1    st|K90752.1   st|K88689.1   PT|np_497749.1    171647

Hello sammy777888,

Not sure how above highlighted line has come because I can't see any relation of line PT|np_001300561.1 171202 from Input_file2 to Input_file1. If this is a typo then following may help you in same.

awk -F'[| ]' 'FNR==NR{a[$2]=$0;b[$2]=$NF;next} ($2 in a){print b[$2] "\t" $0 "\t" b[$2];}' OFS="|"   Input_file2  Input_file1

Output will be as follows.

171200  PT|np_496075.1 st|K92748.1 st|K89648.1 PT|np_001300561.1        171200
171648  PT|np_497284.1 st|K90752.1 st|K90279.1 PT|np_740775.1   171648
171647  PT|np_497749.1 st|K90752.1 st|K92038.1 PT|np_490856.1   171647
171648  PT|np_497284.1 st|K90752.1 st|K88095.1 PT|np_494764.1   171648
 

If I missed something I apologies for it. If above is as per your expectations then well and good else please show us more clear Input_files with more clear conditions in code tags, so that we could try to help you in same. Have a great weekend ahead.

Thanks,
R. Singh

1 Like

Try:

awk 'NR==FNR{A[$1]=$2; next} {print A[$1], $0, A[$NF]}' file2 file1

Output:

171200 PT|np_496075.1 st|K92748.1 st|K89648.1 PT|np_001300561.1 171202
171648 PT|np_497284.1 st|K90752.1 st|K90279.1 PT|np_740775.1 171647
171647 PT|np_497749.1 st|K90752.1 st|K92038.1 PT|np_490856.1 171608
171648 PT|np_497284.1 st|K90752.1 st|K88095.1 PT|np_494764.1 171646
1 Like