List unique values and count instances in .csv file

batcho · March 20, 2012, 9:12am

I need to take the second column of a .csv file and count the number of instances of each unique value in that same second column. I'd like the output to be value,count sorted by most instances. Thanks for any guidance!

Data example:

317476,317756,0
816063,318861,0
313123,319091,0
317020,318466,0
516779,166445,1
319020,318466,0
316729,316483,0
631299,317306,0
313933,317306,0
168032,318032,0
160183,317722,0
917710,312593,0
316559,317756,0
319424,317756,0

Scrutinizer · March 20, 2012, 9:43am

Try:

awk -F, '{A[$2]++}END{for(i in A)print i,A}' OFS=, infile | sort -t, -k2,2rn

batcho · March 20, 2012, 9:53am

Where do I specify the input and output filenames?

Skrynesaver · March 20, 2012, 10:00am

I'd be tempted to use a Perl hash keyed on the values and sorted by value. eg.

perl -e 'while(<>){@record=split/,/,$_;$count{$record[1]}++;}for $key (reverse sort {$count{$a}<=>$count{$b}} keys %count){print "$key => $count{$key}\n"}'    tmp.dat

batcho · March 20, 2012, 10:03am

Looks great so far. I'm just going to verify the data against something else. Thanks for the quick reply!