Column value and it's count

ysrini · February 24, 2012, 7:03pm

Hi, i have data like
a
b
a
a
b
c
d
...

I have to output info of each distinct value that appears in the column and the count of it
a-3,b-2,c-1,d-1

Is there a single line command using awk or sed to accomplish this?

Thanks,
-srinivas yelamanchili

agama · February 24, 2012, 10:57pm

You can try this:

awk '{c[$1]++;} END { s=""; for( x in c ){ printf( "%s%s=%d", s, x, c[x]); s=","; } printf( "\n" );  }'  input-file

ysrini · February 24, 2012, 11:03pm

agama,
that works Fantastic !!!
Thanks a lot !

jayan_jay · February 25, 2012, 6:12am

sort < infile | uniq -c

codemaniac · February 25, 2012, 6:22am

Associative arrays are just live savers in these needs

awk '{arr[$1]++}END{for (i in arr) {print i "-" arr}}' uniq.dat

sdebasis · February 25, 2012, 7:02am

@Jayan_Jay
So simple.Great , Thanx a ton

ysrini · February 25, 2012, 7:50am

Thanks Jayan,
i used your simple straight forward code as below:
$ cat a.txt
a
a
b
c
a
b
d
a
b
$

echo `cat a.txt | sort | uniq -c | awk '{print $2"-"$1}' | tr '\n' ','` | sed 's/[,]*$//g'

a-4,b-3,c-1,d-1
$

agama · February 25, 2012, 10:48am

I like jayan_jay's idea; I would have written a small sort in awk -- nice approach.

However, the echo and cat are not necessary. Sort can read directly from a file, so there's no need to pipe it in. If you bundle the output into a command line parameter you run the risk of exceeding the maximum number of arguments and experiencing an unnecessary error. Better to write your command this way:

sort a.txt | uniq -c | awk '{print $2"-"$1}' | tr '\n' ','` | sed 's/[,]*$//g'

Further, you can eliminate two processes if you do all the work in the first awk:

sort a.txt | uniq -c | awk '{printf( "%s%s-%s", NR > 1 ? "," : "", $2, $1 ); } END {printf( "\n" );}'