Gawk / Awk Merge Lines based on Key

Hi Guys,

After windows died on my netbook I installed Lubuntu and discovered Gawk about a month ago. After using Excel for 10+ years I'm amazed how quick and easily Gawk can process data but I'm stuck with a little problem merging data from multiple lines.

I'm an SEO Consultant and provide monthly reports to clients which I'm currently revamping. Essentially I have a csv file similar to

domain.com, Default title | Domain.com
domain.com, domain.com/data/product1.html
domain2.com, domain2.com/contact.html
domain2.com, domain2.com/index.html
domain2.com, domain2.com/products/shoes.html

I'm trying to create a file like

domain.com, page.html, product1.html
domain2.com, contact.html, index.html, shoes.html

Each website will have a different number of pages from 1 to 10.

Anyone have any idea how I could do this with Gawk?

Thanks,

James

gawk -F'[,/]' '{a[$1]=($1 in a)?a[$1] OFS $NF:$NF}END {for (i in a) print i,a}' OFS=, myfile.csv

Same results here

awk -F'[,/]' '{a[$1]=((a[$1])?a[$1]",":X)$NF}END{for(i in a) print i,a}' file
awk -F '[,/]' 'r!=$1{if(p)print p; r=p=$1}{p=p", "$NF}END{print p}' file

This solution will work as expected only if the file is sorted first !

True, or rather not sorted, but the lines that have the same labels in $1 need to be consecutive.

Hmm, you expect the input to be perfect ?
The "user" usually don't know that :wink:

Since the first two replies have already fixed the issue, (in fact I write with similar solution, but not quicker than others), Scrutinizer's code is just another solution, and shorter.

So why not have a try?

It is not my experience that "the user" doesn't know. Looking at the kind of input I got the impression it was produced through some form of automation which would make ordering more likely. Anyway, I am sure the OP is more than capable of determining if it fits his bill. If so, this script has minimal memory requirements, which might be advantageous under certain conditions, who knows...

Yep, that's a plus.