Match two columns from two files and print output

Hello,
I have two files which are of the following format

File 1 which has two columns

Protein_ID         Substitution
NP_997239         T53R 
NP_060668         V267M                   
NP_058515         P856A                   
NP_001206         T55M
NP_006601         D371Y                   
NP_065831         A851V           
...

File 2 which has 7 columns

Protein_ID Substitution dbSNP Prediction Score Median_Info Number_of_Seqs
NP_997239 T53R rs4274008   TOLERATED 1    2.84 15                  
NP_060668 V267M rs12137794 TOLERATED 0.23 1.69 75                  
NP_058515 P856A rs228697   TOLERATED 0.36 2.17 37
NP_001206 T55M  rs2274327  TOLERATED 0.1  1.76 97                  
NP_005017 S312C rs61755420 TOLERATED 0.05 1.87 68                  
NP_005017 A697V rs1281018  TOLERATED 0.21 1.99 67                  
NP_060668 V533D rs1281016  TOLERATED 0.63 2.04 72

I need the output by matching the first two columns from both files (Protein_ID Substitution)

NP_997239 T53R  rs4274008  TOLERATED 1    2.84 15                  
NP_060668 V267M rs12137794 TOLERATED 0.23 1.69 75                  
NP_058515 P856A rs228697   TOLERATED 0.36 2.17 37
NP_001206 T55M  rs2274327  TOLERATED 0.1  1.76 97                  
NP_006601 D371Y NA         NA        NA   NA   NA
NP_065831 A851V NA         NA        NA   NA   NA

So far this is what I got:

awk 'FNR == NR { h[$1] = 1; next } h[$1]' file1 file2 > output

but this matches only the first column and does not includes the non matching values.....
Thank you

Please use code tags for code and data sample.

 awk 'FNR == NR { h[$1,$2] = 1; next } h[$1,$2]' file1 file2 > output
1 Like

Starting off with and extending pamu's proposal:

awk 'FNR == NR { h[$1,$2] = $0; next } h[$1,$2] {print h[$1,$2];next} {print $1,$2,"N/A","N/A"}' file2 file1 
Protein_ID Substitution dbSNP Prediction Score Median_Info Number_of_Seqs
NP_997239 T53R rs4274008   TOLERATED 1    2.84 15                  
NP_060668 V267M rs12137794 TOLERATED 0.23 1.69 75                  
NP_058515 P856A rs228697   TOLERATED 0.36 2.17 37
NP_001206 T55M  rs2274327  TOLERATED 0.1  1.76 97                  
NP_006601 D371Y N/A N/A
NP_065831 A851V N/A N/A

You may have to add some "N/A"s...