awk Group By and count string occurrences

Royi · August 8, 2013, 3:45am

Hi Gurus,
I'm scratching my head over and over and couldn't find the the right way to compose this AWK properly - PLEASE HELP

Input:

c,d,e,CLICK
a,b,c,CLICK
a,b,c,CONV
c,d,e,CLICK
a,b,c,CLICK
a,b,c,CLICK
a,b,c,CONV
b,c,d,CLICK
c,d,e,CLICK
c,d,e,CLICK
b,c,d,CONV
a,b,c,CLICK
b,c,d,CLICK
b,c,d,CLICK
c,d,e,CLICK

Desired Output:

a,b,c,4,2
b,c,d,3,1
c,d,e,5,0

##Explenation: the Key (group by) is fields $1+$2+$3
##The 4th column counts the occurrences of "CLICK"
##The 5th column counts the occurrences of "CONV"

anbu23 · August 8, 2013, 3:51am

Can you post what you have tried?

Jotne · August 8, 2013, 4:06am

awk -F, '$4~/CLICK/ {a[$1","$2","$3]++} $4~/CONV/ {b[$1","$2","$3]++} END {for (i in a) print i","a+0"," b+0}'
a,b,c,4,2
c,d,e,5,0
b,c,d,3,1

Royi · August 8, 2013, 4:20am

OMG - You are a genius!!!
Thanks so much!

Jotne · August 8, 2013, 4:22am

awk -F, '
$4~/CLICK/ {a[$1","$2","$3]++}
$4~/CONV/ {b[$1","$2","$3]++} 
END {for (i in a) print i","a+0"," b+0}'

Here we use array to count the number of hits.
One array for CLICK and one for CONV
using $1","$2","$3 as index will name array like a[a,b,c]
This creates one unique array for every different combination of $1,$2,$3
Then it adds up how many it finds by using the ++
a[a,b,c]++ equal a[a,b,c]=a[a,b,c]+1

END {for (i in a) print i","a+0"," b+0}'
This line will run once fore every unique combination of $1,$2,$3
In this case 3 times. Then it prints the value of the array.

Do a search in awk array

Royi · August 8, 2013, 4:23am

BTW - why do we need the "+ 0"?
is this to protect lines that have only CLICK but don't have CONV?

Jotne · August 8, 2013, 4:25am

If it does not find any hit it will print a blank filed " "
To prevent this add a zero so it prints 0 when nothing found.
Removing the +0 gives:

a,b,c,4,2
c,d,e,5,
b,c,d,3,1

vs

a,b,c,4,2
c,d,e,5,0
b,c,d,3,1