awk for matching fields between files with repeated records

jvoot · November 17, 2019, 12:48am

Hello all, I am having trouble with what should be an easy task, but seem to be missing something fundamental. I have two files, with File 1 consisting of a single field of many thousands of records. I also have File 2 with two fields and many thousands of records.

My goal is that when $1 of File 1 matches $1 of File 2, then print $1 and $2 of File 2, or alternatively, print $1 from File 1 with $2 of File 2 when $1 and $2 match between the files. The problem is that File 1 has repeated records in it. Thus when I apply awk 'FNR==NR{a[$1]; next} $1 in a' File 1 File 2 I can get all matches where $1 in File 1 matches $1 in File 2 and then output $1 && $2 in File 2, but without the repeated records. However, I need the order of the records in File 1 to be retained as well as all of the repeated records.

File 1

ABC
DEF
XYZ
ABC
DEF
ABC
XYZ

File 2

ABC 123
DEF 345
XYZ 678

Desired Output:

ABC 123
DEF 345
XYZ 678
ABC 123
DEF 345
ABC 123
XYZ 678

NB: The records are much more varied and repeats much further spread out in the actual file than the simplified examples here.

I had a somewhat similar, albeit more involved, issue in the past that RudiC helped me with (see here), but I am having trouble applying his code to this simpler example.

I got it close with this:

awk 'NR==FNR {q=$1; $1=""; T[q "," ++C[q]] = $0; next} {q=$1; X=q "," ++D[q]; printf "%s\t",  $0; if(X in T); print T[X]}' File 2 File 1

While this attempt printed all of the repeated records of File 1, it only supplied $2 from File 2 along with $1 of File 1 on the first time it appears, but not every time, such as the following:

ABC 123
DEF 345
XYZ 678
ABC
DEF
ABC
XYZ

Thanks so much in advance.

Thanks so much.

RavinderSingh13 · November 17, 2019, 1:04am

Hello jvoot,

Could you please try following.

awk 'FNR==NR{a[$1]=$2;next} ($0 in a){print $1,a[$1]}'  Input_file2   Input_file1

Output will be as follows.

ABC 123
DEF 345
XYZ 678
ABC 123
DEF 345
ABC 123
XYZ 678

EDIT: After reading your question again, 1 question came. Is it you want to check $2 also from Input_file2 to Input_file1 comparison vice?

Thanks,
R. Singh

jvoot · November 17, 2019, 4:22pm

Thanks so much RavinderSingh13. An early quick test seems to reveal that that did the trick!