Gawk / Awk Merge Lines based on Key

Jamesfirst · October 27, 2010, 6:13pm

Hi Guys,

After windows died on my netbook I installed Lubuntu and discovered Gawk about a month ago. After using Excel for 10+ years I'm amazed how quick and easily Gawk can process data but I'm stuck with a little problem merging data from multiple lines.

I'm an SEO Consultant and provide monthly reports to clients which I'm currently revamping. Essentially I have a csv file similar to

domain.com, Default title | Domain.com
domain.com, domain.com/data/product1.html
domain2.com, domain2.com/contact.html
domain2.com, domain2.com/index.html
domain2.com, domain2.com/products/shoes.html

I'm trying to create a file like

domain.com, page.html, product1.html
domain2.com, contact.html, index.html, shoes.html

Each website will have a different number of pages from 1 to 10.

Anyone have any idea how I could do this with Gawk?

Thanks,

James

vgersh99 · October 27, 2010, 6:31pm

gawk -F'[,/]' '{a[$1]=($1 in a)?a[$1] OFS $NF:$NF}END {for (i in a) print i,a}' OFS=, myfile.csv

danmero · October 27, 2010, 6:45pm

Same results here

awk -F'[,/]' '{a[$1]=((a[$1])?a[$1]",":X)$NF}END{for(i in a) print i,a}' file

Scrutinizer · October 27, 2010, 6:48pm

awk -F '[,/]' 'r!=$1{if(p)print p; r=p=$1}{p=p", "$NF}END{print p}' file

danmero · October 27, 2010, 8:04pm

This solution will work as expected only if the file is sorted first !

Scrutinizer · October 27, 2010, 8:16pm

True, or rather not sorted, but the lines that have the same labels in $1 need to be consecutive.

danmero · October 27, 2010, 10:34pm

Hmm, you expect the input to be perfect ?
The "user" usually don't know that

rdcwayx · October 27, 2010, 10:51pm

Since the first two replies have already fixed the issue, (in fact I write with similar solution, but not quicker than others), Scrutinizer's code is just another solution, and shorter.

So why not have a try?

Scrutinizer · October 28, 2010, 6:07am

It is not my experience that "the user" doesn't know. Looking at the kind of input I got the impression it was produced through some form of automation which would make ordering more likely. Anyway, I am sure the OP is more than capable of determining if it fits his bill. If so, this script has minimal memory requirements, which might be advantageous under certain conditions, who knows...

danmero · October 28, 2010, 9:22am

Yep, that's a plus.