Input_file
data1 USA 100 ASE
data3 UK 20 GWQR
data4 Brazil 40 QWE
data2 Scotland 60 THWE
data5 USA 40 QWERR
Reference_file
USA 12312 34532
1324 Brazil 23321
231 3421 Scotland
342 34235 UK
231 141 England
Desired_output_file:
data1 USA 100 ASE USA 12312 34532
data5 USA 40 QWERR USA 12312 34532
data4 Brazil 40 QWE 1324 Brazil 23321
data3 UK 20 GWQR 342 34235 UK
data2 Scotland 60 THWE 231 3421 Scotland
I would like to print out those content that shared between column 2 of Input_file with column 1, 2, 3 of reference_file.
Below is the way I deal with it:
Step 1: merge the share info between column 2 of input_file with column 1 of reference_file:
perl -e ' $col1=1; $col2=0; ($f1,$f2)=@ARGV; open(F2,$f2); while (<F2>) { s/\r?\n//; @F=split /\t/, $_; $line2{$F[$col2]} .= "$_\n" }; $count2 = $.; open(F1,$f1); while (<F1>) { s/\r?\n//; @F=split /\t/, $_; $x = $line2{$F[$col1]}; if ($x) { $num_changes = ($x =~ s/^/$_\t/gm); print $x; $merged += $num_changes } } warn "\nJoining $f1 column $col1 with $f2 column $col2\n$f1: $. lines\n$f2: $count2 lines\nMerged file: $merged lines\n"; ' Input_file Reference_file > tmp1.txt
data1 USA 100 ASE USA 12312 34532
data5 USA 40 QWERR USA 12312 34532
Step 2: merge the share info between column 2 of input_file with column 2 of reference_file:
perl -e ' $col1=1; $col2=1; ($f1,$f2)=@ARGV; open(F2,$f2); while (<F2>) { s/\r?\n//; @F=split /\t/, $_; $line2{$F[$col2]} .= "$_\n" }; $count2 = $.; open(F1,$f1); while (<F1>) { s/\r?\n//; @F=split /\t/, $_; $x = $line2{$F[$col1]}; if ($x) { $num_changes = ($x =~ s/^/$_\t/gm); print $x; $merged += $num_changes } } warn "\nJoining $f1 column $col1 with $f2 column $col2\n$f1: $. lines\n$f2: $count2 lines\nMerged file: $merged lines\n"; ' Input_file Reference_file > tmp2.txt
data4 Brazil 40 QWE 1324 Brazil 23321
Step 3: merge the share info between column 2 of input_file with column 3 of reference_file:
perl -e ' $col1=1; $col2=2; ($f1,$f2)=@ARGV; open(F2,$f2); while (<F2>) { s/\r?\n//; @F=split /\t/, $_; $line2{$F[$col2]} .= "$_\n" }; $count2 = $.; open(F1,$f1); while (<F1>) { s/\r?\n//; @F=split /\t/, $_; $x = $line2{$F[$col1]}; if ($x) { $num_changes = ($x =~ s/^/$_\t/gm); print $x; $merged += $num_changes } } warn "\nJoining $f1 column $col1 with $f2 column $col2\n$f1: $. lines\n$f2: $count2 lines\nMerged file: $merged lines\n"; ' Input_file Reference_file > tmp3.txt
data3 UK 20 GWQR 342 34235 UK
data2 Scotland 60 THWE 231 3421 Scotland
Concetate all tmp*.txt together:
cat tmp1.txt tmp2.txt tmp3.txt > Desired_output_file.txt
It seems like awk "if..else...else if" condition able to save the progress time?
Thanks for any advice.