batcho
March 20, 2012, 9:12am
1
I need to take the second column of a .csv file and count the number of instances of each unique value in that same second column. I'd like the output to be value,count sorted by most instances. Thanks for any guidance!
Data example:
317476,317756,0
816063,318861,0
313123,319091,0
317020,318466,0
516779,166445,1
319020,318466,0
316729,316483,0
631299,317306,0
313933,317306,0
168032,318032,0
160183,317722,0
917710,312593,0
316559,317756,0
319424,317756,0
Try:
awk -F, '{A[$2]++}END{for(i in A)print i,A}' OFS=, infile | sort -t, -k2,2rn
317756,3
317306,2
318466,2
166445,1
312593,1
316483,1
317722,1
318032,1
318861,1
319091,1
1 Like
batcho
March 20, 2012, 9:53am
4
Where do I specify the input and output filenames?
I'd be tempted to use a Perl hash keyed on the values and sorted by value. eg.
perl -e 'while(<>){@record=split/,/,$_;$count{$record[1]}++;}for $key (reverse sort {$count{$a}<=>$count{$b}} keys %count){print "$key => $count{$key}\n"}' tmp.dat
batcho
March 20, 2012, 10:03am
6
Looks great so far. I'm just going to verify the data against something else. Thanks for the quick reply!