Search Pattern and combine into single file

navkanwal · June 25, 2013, 1:36am

Hi Experts

Please help me out with the following thing:
2 files and want the output file: {No for using FOR loop because I got 22 million lines}
Tried that "It processes only 8000 records per hour"
I need a faster way out !!!
FileA:

FileB:

Output File:

9051;9051;123
9052;9052;456
9053;
9054;9054;567
9055;
9056;
9057;9057;789
9058;
9059;

Regards

balajesuri · June 25, 2013, 2:28am

Try join . I can't tell for sure if this efficient enough (I don't have the huge data similar to the samples provided)

[user@host ~]$ join -t';' -a1 -o0 2.1 2.2 fileA fileB
9051;9051;123
9052;9052;456
9053;;
9054;9054;567
9055;;
9056;;
9057;9057;789
9058;;
9059;9059;123
[user@host ~]$

navkanwal · June 25, 2013, 2:36am

Thanks Sir, but I am getting the following output:

9051;9051;123
9052;9052;456
9053;;
9054;;
9055;;
9056;;
9057;;
9058;;
9059;;

I tried the following command, but it only gives me matches pattern:

awk -F";" 'NR==FNR{A[$1]=$0;next}$1 in A{$0=$0","A[$1];print}' fileA fileB

Regards
Navkanwal

balajesuri · June 25, 2013, 2:48am

If you load 22 millions lines in an array, it will definitely lead to performance issues. And here's your awk one-liner:

awk -F";" 'NR==FNR{A[$1]=$2;next} {if (A[$1]){$0=$0";"$1";"A[$1];print}else{print}}' fileB fileA

And, with the sample fileA and fileB you provided in post #1, using join I got the output which I've pasted in post #2. Not sure, why you're getting a different output. Could you please re-check.

navkanwal · June 25, 2013, 6:24am

Thanks Sir.

The command is working fine and output was generated in less than 5 minutes.
Thanks for your support.

Regards

balajesuri · June 25, 2013, 7:20am

Please mention which command worked fine.. awk or join? It'll be helpful to those who visit this thread at a later point.