Hi all, I'm pretty much a newbie to UNIX. I would appreciate any help with UNIX coding on comparing two large csv files (greater than 10 GB in size), and output a file with matching columns.
I want to compare file1 and file2 by 'id' and 'chain' columns, then extract exact matching rows' remaining columns from file2 and add them to file1's columns, and remove no match rows. Also create new entries in file1 for multiple row matches from file2
For example:
$ head file1
id,chain,offer,market,repeattrips,repeater,offerdate
86246,205,1208251,34,5,t,2013-04-24
86252,205,1197502,34,16,t,2013-03-27
12682470,18,1197502,11,0,f,2013-03-28
12996040,15,1197502,9,0,f,2013-03-25
13089312,15,1204821,9,0,f,2013-04-01
$ head file2
id,chain,dept,category,company,brand,date,productsize,productmeasure,purchasequantity,purchaseamount
86246,205,7,707,1078778070,12564,2012-03-02,12,OZ,1,7.59
86246,205,63,6319,107654575,17876,2012-03-02,64,OZ,1,1.59
86246,205,97,9753,1022027929,0,2012-03-02,1,CT,1,5.99
86976,205,25,2509,107996777,31373,2012-03-02,16,OZ,1,1.99
97646,206,55,5555,107684070,32094,2012-03-02,16,OZ,2,10.38
and the desired output would be:
id,chain,dept,category,company,brand,date,productsize,productmeasure,purchasequantity,purchaseamount,offer,market,repeattrips,repeater,offerdate
86246,205,7,707,1078778070,12564,2012-03-02,12,OZ,1,7.59,1208251,34,5,t,2013-04-24
86246,205,63,6319,107654575,17876,2012-03-02,64,OZ,1,1.59,1208251,34,5,t,2013-04-24
86246,205,97,9753,1022027929,0,2012-03-02,1,CT,1,5.99,1208251,34,5,t,2013-04-24
If you leave a code please explain them a little bit
Thanks