Joining Two Files Matching Two Columns

Hi All,

I am looking to join two files where column 1 of file A matches with column 1 of file B and column 5 of files A matches with column 2 of file B. After joining the files based on above condition, out should contain entire line of file A and column 3, 4 and 5 of file B.

Here is sample data:

File A

7507345|NIP1|0.0.0.2|127369256|17561|Closed
3323077|NIP1|0.0.0.1|142830388|34727|Closed
3323077|NIP4|0.0.0.1|773455302|386196|Closed
8171620|NIP8|0.0.0.1|176123103157225888|Closed

File B

7507345|17561|ED|BILL|0
3323077|34727|EF|LAST_BILL_T|0


I have gone through the thread "Join two files combining multiple columns and produce mix and match output* but still not clear on this.

I will appreciate your help on this.

Thanks
Angsuman

How did you apply/modify that cited thread?

Hello RudiC,

I have used following comparing my requirement with example given in the thread. It did not work for me.

Here is the command:

awk -F, 'FNR==NR{A[$2 OFS $3]=FILENAME"."$1 OFS FILENAME"."$3;B[$2 OFS $3]=FILENAME"."$2;next} ($1 OFS $4 in A){print A[$1 OFS $4] OFS FILENAME"." $6 OFS FILENAME"." $1 OFS B[$1 OFS $4] OFS FILENAME"." $1}' OFS=, fileA fileB

I need help to understand the how does that command works.

Thanks
Angshuman

Did you apply that command totally unmodified? After specifying in post#1 that you want to match fields 1 and 5 of fileA with fields 1 and 2 of fileB, you are using fields 2 and 3 of fileA and fields 1 and 4 of fileB. Do you see a chance this should work? On top, the field separator of your files seems to be | , but the script uses , .

A few hints on that script: It first reads fileA and tries to assign a string composition of FILENAME and fields 1 and 3 to array A indexed by fields 2 and 3 (which fails as the fields are NOT separated due to wrong FS) and similar to array B. Then, reading fileB, it tries to match fields 1 and 4 with the indices of A (which would fail even if the FS were correct), and, on positive match, prints an even stranger composition of array elements, FILENAMEs, and fields. Like

file1.7507345|file1.0.0.0.2|file2.|file2.7507345||file2.7507345
file1.3323077|file1.0.0.0.1|file2.|file2.3323077||file2.3323077

(after correcting for the worst errors)

Hello RudiC,

I modified the script as per my requirement but it failed to give output. I did not understand the part "tries to assign a string composition of FILENAME and fields 1 and 3 to array A indexed by by fields 2 and 3". Here is my date and requirement:

File A

7507345|NIP1|0.0.0.2|127369256|17561|Closed
3323077|NIP1|0.0.0.1|142830388|34727|Closed
3323077|NIP4|0.0.0.1|773455302|386196|Closed
8171620|NIP8|0.0.0.1|176123103157225888|Closed

File B

7507345|17561|ED|BILL|0
3323077|34727|EF|LAST_BILL_T|0

Expected Output:

7507345|NIP1|0.0.0.2|127369256|17561|ED|BILL|0
3323077|NIP1|0.0.0.1|142830388|34727|EF|LAST_BILL_T|0

I will really appreciate if you can suggest.

Thanks
Angshuman

Please show your modified script.

why not use the join command to join files? The problem is it only can join on 1 field in each file (default the first field in each file). But after the first join it gets much easier.

$ join -t \| a.tmp b.tmp

7507345|NIP1|0.0.0.2|127369256|17561|Closed|17561|ED|BILL|0
3323077|NIP1|0.0.0.1|142830388|34727|Closed|34727|EF|LAST_BILL_T|0
3323077|NIP4|0.0.0.1|773455302|386196|Closed|34727|EF|LAST_BILL_T|0

After this you only have to keep the lines where field 5 equals field 7, and this is easy with awk. So:

$ join -t \| a.tmp b.tmp |awk -F\| 'BEGIN {OFS="|"}; $5 == $7 {print $1,$2, $3, $4, $5, $6, $8, $9, $10}'

7507345|NIP1|0.0.0.2|127369256|17561|Closed|ED|BILL|0
3323077|NIP1|0.0.0.1|142830388|34727|Closed|EF|LAST_BILL_T|0

Note: join requires the two files to be sorted on the key fields!

1 Like

Thank you Ivo. I have used your idea and it helped.

Hi angshuman,
Obviously what you want to do could be done with just awk instead of using both join and awk . But if you're unwilling to show us the modified awk code that you were using, we can't show you where you went wrong.

Please help us help you. Show us your code (in CODE tags).