So it does. But 120423
and 34532
do not appear in file2
at all, so the 14
does not make any difference. You need to be less vague with "based on the two columns", maybe instead "and exactly matching both the corresponding columns", and also provide an example where 178487
matches and 14
does not.
The code you got (or adapted) from another forum is wrong for your case. It stores $1 $2
from file2
, but for your line layouts it needs to check $2 $4
from file1
.
It is wrong in another way too. If file2
had 234 56
and file1 had 23 456
, it would still match because the code just runs the values together both times as 23456
. The indexes should be $1,$2
and $2,$4
, which inserts a separator between the two values which makes them different in combination.
You might look at the feedback on posts: the one you reference is a rare sight -- an accepted answer with no upvotes. Mind you, in this case the "already answered" thread is also buggy -- que sera, sera.
Usually, you would do { remember[$1 $2]=1 ; next; }
, which saves you testing for the opposite condition in the line after.
It is clumsy to put the filenames into the awk code and also have to specify them on the command line. Maintainers will usually change one and not the other. The usual idiom is to make the test on the first line as FNR == NR
.
This works because FNR is the line number in the current file, and NR is the total lines read so far. If these are the same, you must be reading from the first file (except a corner case if the first file is empty).
The default action for a condition is to print, so my take on this is:
awk 'FNR == NR { ++Hit[$1,$2]; next; } ($2,$4) in Hit' file2 file1