Find common Strings in two large files

Hi ,
I have a text file in the format

DB2: [NodeID=1]
DB2: [NodeID=2]
WB: [NodeID=3]
WB: [NodeID=3]
WB: [NodeID=3]
WB: [NodeID=3]

and a second text file of the format

Time=00:00:00.473 [NodeID=3]
Time=00:00:00.436 [NodeID=3]
Time=00:00:00.016 [NodeID=2]
Time=00:00:00.027 [NodeID=1]
Time=00:00:00.471 [NodeID=3]
Time=00:00:00.436 [NodeID=3]

the last string in both the text files is of the form NodeID=*

I want to combine lines in both the files where the last string in both the files matches ....
something like

DB2: [NodeID=1] Time=00:00:00.027 [NodeID=1]

could you please suggest...

NOTE: the actual size of the text files runs into GBs...

Thanks in advnce.

awk 'NR==FNR{a[$2]=$0;next;}{print $0,a[$2]}' secondFile firstFile
1 Like

Thanks for quick response. As I am beginer in Shell need help in understanding the following

awk 'NR==FNR{a[$2]=$0;next;}{print $0,a[$2]}' secondFile firstFile

In the above code

What does {a[$2]=$0;next;} $2 and $0 stand for...?

so That I can modify your script to make it working...


Here is the explaination of above command (Go through any awk tutorial to get a basic idea of how awk works.)
awk processes input file line by line and in each line, $0 represents the whole line, $1 represents 1st field, $2 represents 2nd field and so on. Default delimiter is space/tab.

echo "abc def ghi" | awk '{print $0}'

will prints whole line

abc def ghi
echo "abc def ghi" | awk '{print $1}'

will prints 1st field

echo "abc def ghi" | awk '{print $2}'

will prints 2nd field


When more than one file is given to awk,


will be true only for 1st file. FNR is record no in current file, NR is record no processed by awk (so commulative count).


This block will execute only for 1st file (As NR==FNR will be true only for 1st file). Here record value is being assigned to array (indexed with 2nd field i.e. [NodeID=1/2/3]). next command will get next line in the file for processing.

{print $0,a[$2]}

This block will execute for 2nd file ONLY.
Here $0 is the whole current record value in 2nd file and $2 is 2nd field in current line (i.e. NodeID in 2nd file). a[$2] will be printed if array was set for this index (2nd field in 2nd file) while processing 1st file.

1 Like

Thanks for all the support - Now scripts are able to deliver ....Thanks a TON