Hello all, I am having trouble with what should be an easy task, but seem to be missing something fundamental. I have two files, with File 1 consisting of a single field of many thousands of records. I also have File 2 with two fields and many thousands of records.
My goal is that when $1 of File 1 matches $1 of File 2, then print $1 and $2 of File 2, or alternatively, print $1 from File 1 with $2 of File 2 when $1 and $2 match between the files. The problem is that File 1 has repeated records in it. Thus when I apply awk 'FNR==NR{a[$1]; next} $1 in a' File 1 File 2
I can get all matches where $1 in File 1 matches $1 in File 2 and then output $1 && $2 in File 2, but without the repeated records. However, I need the order of the records in File 1 to be retained as well as all of the repeated records.
File 1
ABC
DEF
XYZ
ABC
DEF
ABC
XYZ
File 2
ABC 123
DEF 345
XYZ 678
Desired Output:
ABC 123
DEF 345
XYZ 678
ABC 123
DEF 345
ABC 123
XYZ 678
NB: The records are much more varied and repeats much further spread out in the actual file than the simplified examples here.
I had a somewhat similar, albeit more involved, issue in the past that RudiC helped me with (see here), but I am having trouble applying his code to this simpler example.
I got it close with this:
awk 'NR==FNR {q=$1; $1=""; T[q "," ++C[q]] = $0; next} {q=$1; X=q "," ++D[q]; printf "%s\t", $0; if(X in T); print T[X]}' File 2 File 1
While this attempt printed all of the repeated records of File 1, it only supplied $2 from File 2 along with $1 of File 1 on the first time it appears, but not every time, such as the following:
ABC 123
DEF 345
XYZ 678
ABC
DEF
ABC
XYZ
Thanks so much in advance.
Thanks so much.