awk to count using each unique value

ncwxpanther · May 6, 2014, 10:36am

Im looking for an awk script that will take the unique values in column 5, then print and count the unique values in column 6.

CA001011500  11111 11111 -9999 201301 AAA
CA001012040  11111 11111 -9999 201301 AAA
CA001012573  11111 11111 -9999 201301 BBB
CA001012710  11111 11111 -9999 201301 BBB
CA001015105  11111 11111 -9999 201301 BBB
CA001015638  11111 11111 -9999 201301 CCC
CA001011500  11111 11111  -9999 201302 AAA
CA001012040  11111 11111  -9999 201302 BBB
CA001012573  11111 11111 -9999 201302 BBB
CA001012710  11111  11111 -9999 201302 CCC
CA001015105  11111 11111  -9999 201302 CCC
CA001015638  11111 11111  -9999 201302 CCC

Output will need to be:

201301 AAA 2
201301 BBB 3
201301 CCC 1
201302 AAA 1
201302 BBB 2
201302 CCC 3

I know this will print the unique values of column 5, but how do i tie in the count of unique values in column 6?

 awk '{ a[$5]++ } END { for (b in a) { print b } }'

Thanks in advance!

Yoda · May 6, 2014, 10:38am

Use field 5 and 6 as index:

awk '{A[$5 OFS $6]++}END{for(k in A) print k, A[k]}' file

to get sorted output:

awk '{A[$5 OFS $6]++}END{for(k in A) print k, A[k]}' file | sort

ncwxpanther · May 6, 2014, 10:53am

This works as expected. Thanks!

rdrtx1 · May 6, 2014, 11:16am

try also:

awk '$0=$5 FS $6' infile | uniq -c

Don_Cragun · May 6, 2014, 12:08pm

As long as the unique values you want to count are on adjacent lines in your input file, you can easily do it all in awk without needing sort or uniq :

awk '
last != $5 OFS $6 {
        if(c) print last, c
        c = 1
        last = $5 OFS $6
        next
}
{       c++}
END {   print last, c}' file

If you want to try this on a Solaris/SunOS system, change awk to /usr/xpg4/bin/awk , /usr/xpg6/bin/awk , or nawk .