ysrini
February 24, 2012, 7:03pm
1
Hi, i have data like
a
b
a
a
b
c
d
...
I have to output info of each distinct value that appears in the column and the count of it
a-3,b-2,c-1,d-1
Is there a single line command using awk or sed to accomplish this?
Thanks,
-srinivas yelamanchili
agama
February 24, 2012, 10:57pm
2
You can try this:
awk '{c[$1]++;} END { s=""; for( x in c ){ printf( "%s%s=%d", s, x, c[x]); s=","; } printf( "\n" ); }' input-file
1 Like
ysrini
February 24, 2012, 11:03pm
3
agama,
that works Fantastic !!!
Thanks a lot !
Associative arrays are just live savers in these needs
awk '{arr[$1]++}END{for (i in arr) {print i "-" arr}}' uniq.dat
1 Like
@Jayan_Jay
So simple.Great , Thanx a ton
ysrini
February 25, 2012, 7:50am
7
Thanks Jayan,
i used your simple straight forward code as below:
$ cat a.txt
a
a
b
c
a
b
d
a
b
$
echo `cat a.txt | sort | uniq -c | awk '{print $2"-"$1}' | tr '\n' ','` | sed 's/[,]*$//g'
a-4,b-3,c-1,d-1
$
agama
February 25, 2012, 10:48am
8
I like jayan_jay's idea; I would have written a small sort in awk -- nice approach.
However, the echo and cat are not necessary. Sort can read directly from a file, so there's no need to pipe it in. If you bundle the output into a command line parameter you run the risk of exceeding the maximum number of arguments and experiencing an unnecessary error. Better to write your command this way:
sort a.txt | uniq -c | awk '{print $2"-"$1}' | tr '\n' ','` | sed 's/[,]*$//g'
Further, you can eliminate two processes if you do all the work in the first awk:
sort a.txt | uniq -c | awk '{printf( "%s%s-%s", NR > 1 ? "," : "", $2, $1 ); } END {printf( "\n" );}'