Joining Three Files

mac4rfree · August 11, 2011, 6:26am

Hi guys,

I have three files which needs to be joined to a single file.

File 1:
Col a, Col b, Col c

File 2:
Col 1a, Col 1b

File 3:
Col 2a, Col 2b

Output:
Col 1a, Col 2a, Col a, Col b, Col c.

All the files are comma delimited. I need to join Col b with Col 1b and need to bring Col 1a to the output. If there is no matching record, then " " has to come to the output.(Left join).
similarly col c needs to be joined with Col 2b and Col 2a needs to be output.

Can somebody help me out with this.

Cheers!!!!

yazu · August 11, 2011, 6:55am

Give more lines in your input files and the output (with and without matches).

mac4rfree · August 11, 2011, 8:28am

First file

Name TransactionID StoreID CustomerID RegTransr TransactionCode
StoreCENTER 620499 1238 12380005080811100000 1619 1501
StoreCENTER 620500 1238 12380005080811100000 1620 1501
StoreCENTER 634956 557 3243 1
StoreCENTER 620501 1238 8871 2
StoreCENTER 634957 557 3244 1
StoreCENTER 1030086 625 8979 1
StoreCENTER 620507 1238 4200 1
StoreCENTER 1030087 625 8980 1
StoreCENTER 620508 1238 8874 2
StoreCENTER 620511 1238 4201 1
StoreCENTER 620512 1238 12380005080811100000 1625 1501

Second File
TransactionID StoreID SequenceID
620499 1238 1
620500 1238 1
620512 1238 1
620528 1238 1
620535 1238 1
620543 1238 1
620545 1238 1

Output File
Name TransactionID StoreID CustomerID RegTransr TransactionCode SequenceID
StoreCENTER 620499 1238 12380005080811100000 1619 1501 1
StoreCENTER 620500 1238 12380005080811100000 1620 1501 1
StoreCENTER 634956 557 3243 1
StoreCENTER 620501 1238 8871 2
StoreCENTER 634957 557 3244 1
StoreCENTER 1030086 625 8979 1
StoreCENTER 620507 1238 4200 1
StoreCENTER 1030087 625 8980 1
StoreCENTER 620508 1238 8874 2
StoreCENTER 620511 1238 4201 1
StoreCENTER 620512 1238 12380005080811100000 1625 1501 1

Files are joined by TransactionCode(Second Column after StoreCENTER) and StoreID (next Column) . The Last column from the output is coming from Second file

panyam · August 11, 2011, 9:06am

 
awk 'NR==FNR {a[$1]=$3;next} $2 in a { print $0" "a[$2];next}1' file2 file1

mac4rfree · August 11, 2011, 9:16am

Thanks Panyam for your help..
But i am getting this errror

$ awk 'NR==FNR {a[$1]=$3;next} $2 in a { print $0" "a[$2];next}1' mac_trans.txt mac_discount.txt
awk: syntax error near line 1
awk: bailing out near line 1

Even though the data is same, the file is a CSV.

$ cat mac_trans.txt
StoreCENTER,625_1030061_830 AM,3485e17a-0baa-46a9-9edf-07a0858c4188,1030061,8956,1,625,MBROWNE,3,000000011603
StoreCENTER,625_1030062_833 AM,992b6947-a404-4e10-8e35-44dd85ccd5ee,1030062,8957,1,625,MBROWNE,3,000000011603
StoreCENTER,625_1030063_838 AM,aa742016-0b68-4155-bc60-ca44f3dd504b,1030063,8958,1,625,MBROWNE,3,000000011603
StoreCENTER,625_1030064_839 AM,898189a4-7e5e-4341-9492-d6b72549c9bc,1030064,8959,1,625,MBROWNE,3,000000011603
StoreCENTER,625_1030065_841 AM,01fc6fa4-6f40-47ce-ab76-fad4ddd88d6b,1030065,8960,1,625,MBROWNE,3,000000011603
StoreCENTER,625_1030066_843 AM,6ddef2a7-8e11-4076-91fa-8cf3a1d8e3e0,1030066,8961,1,625,MBROWNE,3,000000011603
StoreCENTER,625_1030067_844 AM,7d7ae219-70b7-4b8e-a282-a5a3a4cb2074,1030067,8962,1,625,MBROWNE,3,000000011603
StoreCENTER,557_634951_829 AM,41600465-2939-4967-a2e8-f8b32f7284a1,634952,3239,1,557,BGUZZI,1,000000011837
StoreCENTER,557_634953_832 AM,679b8707-993d-4fb0-b8e8-779b1fdebcaa,634953,3240,1,557,BGUZZI,1,000000011837

$ cat mac_discount.txt
1030064,1238,1,12380005080811105527,Customer
620500,1238,1,12380005080811102840,Customer
620528,1238,1,12380005080811142338,Customer
620535,1238,1,12380005080811142752,Customer
620543,1238,1,12380005080811144409,Customer
620545,1238,1,12380005080811144809,Customer
634952,1238,1,12380005080811102548,Customer

panyam · August 11, 2011, 10:15am

 
 
Use nawk or /usr/xpg4/bin/awk on Solaris.
In case of a csv , awk -F"," should do the work for you

mac4rfree · August 12, 2011, 7:27am

Hi Panyam,

I was able to use awk. I wrote small awk program to check it.

[pkdprd1@test-dwbi-app2:/import/volsap/dev]
$ awk -F"," '{print $1}' mac_discount.txt
1030064
620500
620528
620535
620543
620545
634952

But somehow when i using your awk program its giving me an errror. Can you please help me

yazu · August 12, 2011, 8:14am

I believe your system has "old" awk. It can't understand "f in a" construction. Why don't you just try change awk to nawk ("new" awk)?

mac4rfree · August 12, 2011, 8:32am

frankly yazu, i love to do it.. but i don hav the authority to do it.. i am working on someone else unix server...

yazu · August 12, 2011, 8:36am

Oh... Change on the command line, not in the system. Just use

$ nawk ...

mac4rfree · August 12, 2011, 8:45am

ohh.. thanks for that.. now the command is running. But i am getting all the records in second file whereas i need only the matched records. Let me explain one more time.
First file:

$ cat mac_trans.txt
StoreCENTER,625_1030061_830 AM,3485e17a-0baa-46a9-9edf-07a0858c4188,1030061,8956,1,625,MBROWNE,3,000000011603
StoreCENTER,625_1030062_833 AM,992b6947-a404-4e10-8e35-44dd85ccd5ee,1030062,8957,1,625,MBROWNE,3,000000011603
StoreCENTER,625_1030063_838 AM,aa742016-0b68-4155-bc60-ca44f3dd504b,1030063,8958,1,625,MBROWNE,3,000000011603
StoreCENTER,625_1030064_839 AM,898189a4-7e5e-4341-9492-d6b72549c9bc,1030064,8959,1,625,MBROWNE,3,000000011603
StoreCENTER,625_1030065_841 AM,01fc6fa4-6f40-47ce-ab76-fad4ddd88d6b,1030065,8960,1,625,MBROWNE,3,000000011603
StoreCENTER,625_1030066_843 AM,6ddef2a7-8e11-4076-91fa-8cf3a1d8e3e0,1030066,8961,1,625,MBROWNE,3,000000011603
StoreCENTER,625_1030067_844 AM,7d7ae219-70b7-4b8e-a282-a5a3a4cb2074,1030067,8962,1,625,MBROWNE,3,000000011603

Second file

$ cat mac_discount.txt
1030064,1238,1,12380005080811105527,Customer
620500,1238,1,12380005080811102840,Customer
620528,1238,1,12380005080811142338,Customer
620535,1238,1,12380005080811142752,Customer
620543,1238,1,12380005080811144409,Customer
620545,1238,1,12380005080811144809,Customer
634952,1238,1,12380005080811102548,Customer

4th column in first file has to do a lookup with 1st column in second file and has to be bring me the output with the additinal columns from the second file.

yazu · August 12, 2011, 8:55am

Where is an output examples. I can guess than from your input files you want to get this one line:

StoreCENTER,625_1030064_839 AM,898189a4-7e5e-4341-9492-d6b72549c9bc,
1030064,8959,1,625,MBROWNE,3,000000011603,1238,1,12380005080811105527,Customer

But maybe i'm wrong?

mac4rfree · August 12, 2011, 9:00am

Sorry my bad. I should give you output. It should be like this.
Output

StoreCENTER,625_1030061_830 AM,3485e17a-0baa-46a9-9edf-07a0858c4188,1030061,8956,1,625,MBROWNE,3,000000011603,,,
StoreCENTER,625_1030062_833 AM,992b6947-a404-4e10-8e35-44dd85ccd5ee,1030062,8957,1,625,MBROWNE,3,000000011603,,,
StoreCENTER,625_1030063_838 AM,aa742016-0b68-4155-bc60-ca44f3dd504b,1030063,8958,1,625,MBROWNE,3,000000011603,,,
StoreCENTER,625_1030064_839 AM,898189a4-7e5e-4341-9492-d6b72549c9bc,1030064,8959,1,625,MBROWNE,3,000000011603,1238,1,12380005080811105527,Customer
StoreCENTER,625_1030065_841 AM,01fc6fa4-6f40-47ce-ab76-fad4ddd88d6b,1030065,8960,1,625,MBROWNE,3,000000011603,,,
StoreCENTER,625_1030066_843 AM,6ddef2a7-8e11-4076-91fa-8cf3a1d8e3e0,1030066,8961,1,625,MBROWNE,3,000000011603,,,
StoreCENTER,625_1030067_844 AM,7d7ae219-70b7-4b8e-a282-a5a3a4cb2074,1030067,8962,1,625,MBROWNE,3,000000011603,,,

Just like a left outer join. Hope i make sense.

yazu · August 12, 2011, 9:07am

Why only 3 blank fields on non-matching lines (and four additional on matching ones)?
And do you need to keep the order of lines in mac_trans.txt or it doesn't matter?

mac4rfree · August 12, 2011, 9:11am

i need to pull the three columns from the second file if they have a matching record in second file or space has to be provided for each of the columns from second file which does not have matching record.
Yes you are right there should 4 commas (,).
Order in Mac_trans.txt does not matter untill all the records comes out in the output.
Meanwhile am also trying . This is what i am trying but could not give the output (req).

nawk 'NR==FNR { a[$1]=$2$3$4$5 } NR>FNR { k=$4; if (k in a) print k,a[k]; else print $0",,,"}' mac_discount.txt mac_trans.txt

yazu · August 12, 2011, 9:17am

nawk -F, '
NR == FNR { a[$1]=$2 "," $3 "," $4 "," $5 } 
NR != FNR && $4 in a { print  $0 "," a[$4]}
NR != FNR && !($4 in a) { print $0 ",,,," }
' mac_discount.txt mac_trans.txt

mac4rfree · August 12, 2011, 9:51am

You my man.. now i see where i went wrong.
But can we add one more key column to it??
4th and 5th column in file 1 has to match with 1st and 2nd column in file 2
Will this work????

nawk -F, '
NR == FNR { a[$1","$2]="," $3 "," $4 "," $5 } 
NR != FNR && $4","$5 in a { print  $0 "," a[$4]}
NR != FNR && !($4","$5 in a) { print $0 ",,,," }
' mac_discount.txt mac_trans.txt

---------- Post updated at 07:13 PM ---------- Previous update was at 07:09 PM ----------

This is the set of files i am working on and the output. Only issue am seeing is am getting a space instead of 2nd column from the 2nd file.

$ cat mac_discount.txt
1030064,8959,1,12380005080811105527,Customer
620500,1238,1,12380005080811102840,Customer
620528,1238,1,12380005080811142338,Customer
620535,1238,1,12380005080811142752,Customer
620543,1238,1,12380005080811144409,Customer
620545,1238,1,12380005080811144809,Customer
634952,1238,1,12380005080811102548,Customer
TransDiscountGroupTransactionID,TransDiscountGroupStoreID,SequenceID,AlternateCustomerID,DiscountGroupID
[pkdprd1@test-dwbi-app2:/import/volsap/dev]
$ cat mac_trans.txt
StoreCENTER,625_1030061_830 AM,3485e17a-0baa-46a9-9edf-07a0858c4188,1030061,8956,1,625,MBROWNE,3,000000011603
StoreCENTER,625_1030062_833 AM,992b6947-a404-4e10-8e35-44dd85ccd5ee,1030062,8957,1,625,MBROWNE,3,000000011603
StoreCENTER,625_1030063_838 AM,aa742016-0b68-4155-bc60-ca44f3dd504b,1030063,8958,1,625,MBROWNE,3,000000011603
StoreCENTER,625_1030064_839 AM,898189a4-7e5e-4341-9492-d6b72549c9bc,1030064,8959,1,625,MBROWNE,3,000000011603
StoreCENTER,625_1030065_841 AM,01fc6fa4-6f40-47ce-ab76-fad4ddd88d6b,1030065,8960,1,625,MBROWNE,3,000000011603
StoreCENTER,625_1030066_843 AM,6ddef2a7-8e11-4076-91fa-8cf3a1d8e3e0,1030066,8961,1,625,MBROWNE,3,000000011603
StoreCENTER,625_1030067_844 AM,7d7ae219-70b7-4b8e-a282-a5a3a4cb2074,1030067,8962,1,625,MBROWNE,3,000000011603
[pkdprd1@test-dwbi-app2:/import/volsap/dev]
$ nawk -F, '
NR == FNR { a[$1","$2]="," $3 "," $4 "," $5 }
NR != FNR && $4","$5 in a { print  $0 "," a[$4","$5]}
NR != FNR && !($4","$5 in a) { print $0 ",,,," }
' mac_discount.txt mac_trans.txt
StoreCENTER,625_1030061_830 AM,3485e17a-0baa-46a9-9edf-07a0858c4188,1030061,8956,1,625,MBROWNE,3,000000011603,,,,
StoreCENTER,625_1030062_833 AM,992b6947-a404-4e10-8e35-44dd85ccd5ee,1030062,8957,1,625,MBROWNE,3,000000011603,,,,
StoreCENTER,625_1030063_838 AM,aa742016-0b68-4155-bc60-ca44f3dd504b,1030063,8958,1,625,MBROWNE,3,000000011603,,,,
StoreCENTER,625_1030064_839 AM,898189a4-7e5e-4341-9492-d6b72549c9bc,1030064,8959,1,625,MBROWNE,3,000000011603,,1,12380005080811105527,Customer
StoreCENTER,625_1030065_841 AM,01fc6fa4-6f40-47ce-ab76-fad4ddd88d6b,1030065,8960,1,625,MBROWNE,3,000000011603,,,,
StoreCENTER,625_1030066_843 AM,6ddef2a7-8e11-4076-91fa-8cf3a1d8e3e0,1030066,8961,1,625,MBROWNE,3,000000011603,,,,
StoreCENTER,625_1030067_844 AM,7d7ae219-70b7-4b8e-a282-a5a3a4cb2074,1030067,8962,1,625,MBROWNE,3,000000011603,,,,

---------- Post updated at 07:21 PM ---------- Previous update was at 07:13 PM ----------

THanks Yazu for your help.. Finally this command is doing what i need. Thought of posting it here.

nawk -F, '
NR == FNR { a[$1","$2]=$3 "," $4 "," $5 }
NR != FNR && $4","$5 in a { print  $0 "," a[$4","$5]}
NR != FNR && !($4","$5 in a) { print $0 ",,,," }
' mac_discount.txt mac_trans.txt

mac_discount.txt - Lookup file
mac_trans.txt - parent file(left table)

alister · August 12, 2011, 11:28am

Just in case you are not aware, your output's unmatched records still have more fields than their matched counterparts.

Regards,
Alister