Joining Three Files

Hi guys,

I have three files which needs to be joined to a single file.

File 1:
Col a, Col b, Col c

File 2:
Col 1a, Col 1b

File 3:
Col 2a, Col 2b

Output:
Col 1a, Col 2a, Col a, Col b, Col c.

All the files are comma delimited. I need to join Col b with Col 1b and need to bring Col 1a to the output. If there is no matching record, then " " has to come to the output.(Left join).
similarly col c needs to be joined with Col 2b and Col 2a needs to be output.

Can somebody help me out with this.

Cheers!!!!

Give more lines in your input files and the output (with and without matches).

Files are joined by TransactionCode(Second Column after StoreCENTER) and StoreID (next Column) . The Last column from the output is coming from Second file

 
awk 'NR==FNR {a[$1]=$3;next} $2 in a { print $0" "a[$2];next}1' file2 file1

Thanks Panyam for your help..
But i am getting this errror

$ awk 'NR==FNR {a[$1]=$3;next} $2 in a { print $0" "a[$2];next}1' mac_trans.txt mac_discount.txt
awk: syntax error near line 1
awk: bailing out near line 1

Even though the data is same, the file is a CSV.

$ cat mac_trans.txt
StoreCENTER,625_1030061_830 AM,3485e17a-0baa-46a9-9edf-07a0858c4188,1030061,8956,1,625,MBROWNE,3,000000011603
StoreCENTER,625_1030062_833 AM,992b6947-a404-4e10-8e35-44dd85ccd5ee,1030062,8957,1,625,MBROWNE,3,000000011603
StoreCENTER,625_1030063_838 AM,aa742016-0b68-4155-bc60-ca44f3dd504b,1030063,8958,1,625,MBROWNE,3,000000011603
StoreCENTER,625_1030064_839 AM,898189a4-7e5e-4341-9492-d6b72549c9bc,1030064,8959,1,625,MBROWNE,3,000000011603
StoreCENTER,625_1030065_841 AM,01fc6fa4-6f40-47ce-ab76-fad4ddd88d6b,1030065,8960,1,625,MBROWNE,3,000000011603
StoreCENTER,625_1030066_843 AM,6ddef2a7-8e11-4076-91fa-8cf3a1d8e3e0,1030066,8961,1,625,MBROWNE,3,000000011603
StoreCENTER,625_1030067_844 AM,7d7ae219-70b7-4b8e-a282-a5a3a4cb2074,1030067,8962,1,625,MBROWNE,3,000000011603
StoreCENTER,557_634951_829 AM,41600465-2939-4967-a2e8-f8b32f7284a1,634952,3239,1,557,BGUZZI,1,000000011837
StoreCENTER,557_634953_832 AM,679b8707-993d-4fb0-b8e8-779b1fdebcaa,634953,3240,1,557,BGUZZI,1,000000011837
$ cat mac_discount.txt
1030064,1238,1,12380005080811105527,Customer
620500,1238,1,12380005080811102840,Customer
620528,1238,1,12380005080811142338,Customer
620535,1238,1,12380005080811142752,Customer
620543,1238,1,12380005080811144409,Customer
620545,1238,1,12380005080811144809,Customer
634952,1238,1,12380005080811102548,Customer
 
 
Use nawk or /usr/xpg4/bin/awk on Solaris.
In case of a csv , awk -F"," should do the work for you
 

Hi Panyam,

I was able to use awk. I wrote small awk program to check it.

[pkdprd1@test-dwbi-app2:/import/volsap/dev]
$ awk -F"," '{print $1}' mac_discount.txt
1030064
620500
620528
620535
620543
620545
634952

But somehow when i using your awk program its giving me an errror. Can you please help me

I believe your system has "old" awk. It can't understand "f in a" construction. Why don't you just try change awk to nawk ("new" awk)?

frankly yazu, i love to do it.. but i don hav the authority to do it.. i am working on someone else unix server... :frowning:

Oh... Change on the command line, not in the system. Just use

$ nawk ...

ohh.. thanks for that.. now the command is running. But i am getting all the records in second file whereas i need only the matched records. Let me explain one more time.
First file:

$ cat mac_trans.txt
StoreCENTER,625_1030061_830 AM,3485e17a-0baa-46a9-9edf-07a0858c4188,1030061,8956,1,625,MBROWNE,3,000000011603
StoreCENTER,625_1030062_833 AM,992b6947-a404-4e10-8e35-44dd85ccd5ee,1030062,8957,1,625,MBROWNE,3,000000011603
StoreCENTER,625_1030063_838 AM,aa742016-0b68-4155-bc60-ca44f3dd504b,1030063,8958,1,625,MBROWNE,3,000000011603
StoreCENTER,625_1030064_839 AM,898189a4-7e5e-4341-9492-d6b72549c9bc,1030064,8959,1,625,MBROWNE,3,000000011603
StoreCENTER,625_1030065_841 AM,01fc6fa4-6f40-47ce-ab76-fad4ddd88d6b,1030065,8960,1,625,MBROWNE,3,000000011603
StoreCENTER,625_1030066_843 AM,6ddef2a7-8e11-4076-91fa-8cf3a1d8e3e0,1030066,8961,1,625,MBROWNE,3,000000011603
StoreCENTER,625_1030067_844 AM,7d7ae219-70b7-4b8e-a282-a5a3a4cb2074,1030067,8962,1,625,MBROWNE,3,000000011603

Second file

$ cat mac_discount.txt
1030064,1238,1,12380005080811105527,Customer
620500,1238,1,12380005080811102840,Customer
620528,1238,1,12380005080811142338,Customer
620535,1238,1,12380005080811142752,Customer
620543,1238,1,12380005080811144409,Customer
620545,1238,1,12380005080811144809,Customer
634952,1238,1,12380005080811102548,Customer

4th column in first file has to do a lookup with 1st column in second file and has to be bring me the output with the additinal columns from the second file.

Where is an output examples. I can guess than from your input files you want to get this one line:

StoreCENTER,625_1030064_839 AM,898189a4-7e5e-4341-9492-d6b72549c9bc,
1030064,8959,1,625,MBROWNE,3,000000011603,1238,1,12380005080811105527,Customer

But maybe i'm wrong?

Sorry my bad. I should give you output. It should be like this.
Output

StoreCENTER,625_1030061_830 AM,3485e17a-0baa-46a9-9edf-07a0858c4188,1030061,8956,1,625,MBROWNE,3,000000011603,,,
StoreCENTER,625_1030062_833 AM,992b6947-a404-4e10-8e35-44dd85ccd5ee,1030062,8957,1,625,MBROWNE,3,000000011603,,,
StoreCENTER,625_1030063_838 AM,aa742016-0b68-4155-bc60-ca44f3dd504b,1030063,8958,1,625,MBROWNE,3,000000011603,,,
StoreCENTER,625_1030064_839 AM,898189a4-7e5e-4341-9492-d6b72549c9bc,1030064,8959,1,625,MBROWNE,3,000000011603,1238,1,12380005080811105527,Customer
StoreCENTER,625_1030065_841 AM,01fc6fa4-6f40-47ce-ab76-fad4ddd88d6b,1030065,8960,1,625,MBROWNE,3,000000011603,,,
StoreCENTER,625_1030066_843 AM,6ddef2a7-8e11-4076-91fa-8cf3a1d8e3e0,1030066,8961,1,625,MBROWNE,3,000000011603,,,
StoreCENTER,625_1030067_844 AM,7d7ae219-70b7-4b8e-a282-a5a3a4cb2074,1030067,8962,1,625,MBROWNE,3,000000011603,,,

Just like a left outer join. Hope i make sense.

Why only 3 blank fields on non-matching lines (and four additional on matching ones)?
And do you need to keep the order of lines in mac_trans.txt or it doesn't matter?

i need to pull the three columns from the second file if they have a matching record in second file or space has to be provided for each of the columns from second file which does not have matching record.
Yes you are right there should 4 commas (,).
Order in Mac_trans.txt does not matter untill all the records comes out in the output.
Meanwhile am also trying . This is what i am trying but could not give the output (req).

nawk 'NR==FNR { a[$1]=$2$3$4$5 } NR>FNR { k=$4; if (k in a) print k,a[k]; else print $0",,,"}' mac_discount.txt mac_trans.txt
nawk -F, '
NR == FNR { a[$1]=$2 "," $3 "," $4 "," $5 } 
NR != FNR && $4 in a { print  $0 "," a[$4]}
NR != FNR && !($4 in a) { print $0 ",,,," }
' mac_discount.txt mac_trans.txt
1 Like

You my man.. now i see where i went wrong.
But can we add one more key column to it??
4th and 5th column in file 1 has to match with 1st and 2nd column in file 2
Will this work????

nawk -F, '
NR == FNR { a[$1","$2]="," $3 "," $4 "," $5 } 
NR != FNR && $4","$5 in a { print  $0 "," a[$4]}
NR != FNR && !($4","$5 in a) { print $0 ",,,," }
' mac_discount.txt mac_trans.txt

---------- Post updated at 07:13 PM ---------- Previous update was at 07:09 PM ----------

This is the set of files i am working on and the output. Only issue am seeing is am getting a space instead of 2nd column from the 2nd file.

$ cat mac_discount.txt
1030064,8959,1,12380005080811105527,Customer
620500,1238,1,12380005080811102840,Customer
620528,1238,1,12380005080811142338,Customer
620535,1238,1,12380005080811142752,Customer
620543,1238,1,12380005080811144409,Customer
620545,1238,1,12380005080811144809,Customer
634952,1238,1,12380005080811102548,Customer
TransDiscountGroupTransactionID,TransDiscountGroupStoreID,SequenceID,AlternateCustomerID,DiscountGroupID
[pkdprd1@test-dwbi-app2:/import/volsap/dev]
$ cat mac_trans.txt
StoreCENTER,625_1030061_830 AM,3485e17a-0baa-46a9-9edf-07a0858c4188,1030061,8956,1,625,MBROWNE,3,000000011603
StoreCENTER,625_1030062_833 AM,992b6947-a404-4e10-8e35-44dd85ccd5ee,1030062,8957,1,625,MBROWNE,3,000000011603
StoreCENTER,625_1030063_838 AM,aa742016-0b68-4155-bc60-ca44f3dd504b,1030063,8958,1,625,MBROWNE,3,000000011603
StoreCENTER,625_1030064_839 AM,898189a4-7e5e-4341-9492-d6b72549c9bc,1030064,8959,1,625,MBROWNE,3,000000011603
StoreCENTER,625_1030065_841 AM,01fc6fa4-6f40-47ce-ab76-fad4ddd88d6b,1030065,8960,1,625,MBROWNE,3,000000011603
StoreCENTER,625_1030066_843 AM,6ddef2a7-8e11-4076-91fa-8cf3a1d8e3e0,1030066,8961,1,625,MBROWNE,3,000000011603
StoreCENTER,625_1030067_844 AM,7d7ae219-70b7-4b8e-a282-a5a3a4cb2074,1030067,8962,1,625,MBROWNE,3,000000011603
[pkdprd1@test-dwbi-app2:/import/volsap/dev]
$ nawk -F, '
NR == FNR { a[$1","$2]="," $3 "," $4 "," $5 }
NR != FNR && $4","$5 in a { print  $0 "," a[$4","$5]}
NR != FNR && !($4","$5 in a) { print $0 ",,,," }
' mac_discount.txt mac_trans.txt
StoreCENTER,625_1030061_830 AM,3485e17a-0baa-46a9-9edf-07a0858c4188,1030061,8956,1,625,MBROWNE,3,000000011603,,,,
StoreCENTER,625_1030062_833 AM,992b6947-a404-4e10-8e35-44dd85ccd5ee,1030062,8957,1,625,MBROWNE,3,000000011603,,,,
StoreCENTER,625_1030063_838 AM,aa742016-0b68-4155-bc60-ca44f3dd504b,1030063,8958,1,625,MBROWNE,3,000000011603,,,,
StoreCENTER,625_1030064_839 AM,898189a4-7e5e-4341-9492-d6b72549c9bc,1030064,8959,1,625,MBROWNE,3,000000011603,,1,12380005080811105527,Customer
StoreCENTER,625_1030065_841 AM,01fc6fa4-6f40-47ce-ab76-fad4ddd88d6b,1030065,8960,1,625,MBROWNE,3,000000011603,,,,
StoreCENTER,625_1030066_843 AM,6ddef2a7-8e11-4076-91fa-8cf3a1d8e3e0,1030066,8961,1,625,MBROWNE,3,000000011603,,,,
StoreCENTER,625_1030067_844 AM,7d7ae219-70b7-4b8e-a282-a5a3a4cb2074,1030067,8962,1,625,MBROWNE,3,000000011603,,,,

---------- Post updated at 07:21 PM ---------- Previous update was at 07:13 PM ----------

THanks Yazu for your help.. Finally this command is doing what i need. Thought of posting it here.

nawk -F, '
NR == FNR { a[$1","$2]=$3 "," $4 "," $5 }
NR != FNR && $4","$5 in a { print  $0 "," a[$4","$5]}
NR != FNR && !($4","$5 in a) { print $0 ",,,," }
' mac_discount.txt mac_trans.txt

mac_discount.txt - Lookup file
mac_trans.txt - parent file(left table)

Just in case you are not aware, your output's unmatched records still have more fields than their matched counterparts.

Regards,
Alister