Grouping multiple columns and concatenate

I have a CSV file that goes like this:

Name,Group,Email
Max,Group1,max@.com
Dan,Group2,dan@.com
Max,Group3,max@.com
Max,Group4,max@.com
Dan,Group5,dan@.com
Jim,Group6,jim@.com

Basically my desired output should be:

Name,Group,Email
Max,Group1|Group3|Group4,max@.com
Dan,Group2|Group5,dan@.com
Jim,Group6,jim@.com

I have tried using this code:

awk -F',' '{if(a[$1])a[$1]=a[$1]"|"$2","$3; else a[$1]=$2$3;}END{for (i in a)print i "," a;}' OFS=, FileInput

but I think this is only usable for a file with 2 columns, also is it possible to not include the headers in concatenating.

Thanks a lot!

awk -F',' 'NR==1{print;next} {a[$1] = a[$1]"|"$2; b[$1] = $3} END {for (x in a){print x,substr(a[x],2),b[x]}}' OFS=, FileInput
1 Like

If a given name might have a different email address for some groups, you might want to try:

awk '
BEGIN { FS = OFS = "," }
NR == 1 {print
        next
}       
!(($1,$3) in n) {
        n[$1,$3] = $1
        g[$1,$3] = $2
        e[$1,$3] = $3 
        next
}
{       g[$1,$3] = g[$1,$3] "|" $2 }
END {   for(i in n)
                print n, g, e
}' FileInput

As always, if you want to try this on a Solaris/SunOS system, replace awk in this script with /usr/xpg4/bin/awk , /usr/xpg6/bin/awk , or nawk .

1 Like

How about this

$ awk -F, '{_[$3]=(!(_[$3]))? $1 FS $2: _[$3] OFS $2}END{for(i in _)print _ FS i}' OFS="|" file

EDIT of _ to Arr : Easy to understand as suggested by Jotne in #5

$ awk -F, '{Arr[$3]=!Arr[$3]? $1 FS $2: Arr[$3] OFS $2}END{for(i in Arr)print Arr FS i}' OFS="|" file

Resulting

Name,Group,Email
Dan,Group2|Group5,dan@.com
Jim,Group6,jim@.com
Max,Group1|Group3|Group4,max@.com
1 Like

Its not a good habit to use _ as an variable. New user would have problem understanding what is going on.
Change it to some like

awk -F, '{arr[$3]=arr[$3]?arr[$3] OFS $2:$1 FS $2} END {for(i in arr) print arr FS i}' OFS=\| file

also removed some not needed parentheses around first test, and change the order of it to test for true.

1 Like

Sorry Jotne you told me before also to replace _ with some name as its difficult for new users to understand code, I am forgetting every time, I just modified #4 as you said.

1 Like

Thanks for the help guys. It's finally done!