frequency count using shell

xshang · October 9, 2012, 4:51pm

Hello everyone,
please consider the following lines of a matrix

[574,]   59   32 
[575,]   59   32 
[576,]   59   32 
[577,]   59   32 
[578,]   59   32 
[579,]   59   32 
[580,]   59   32 
[581,]   60   32 
[582,]   60   33 
[583,]   60   33 
[584,]   60   33 
[585,]   60   33 
[586,]   60   33 
[587,]   60   33 
[588,]   60   33 
[589,]   60   33 
[590,]   60   33 
[591,]   61   33 
[592,]   61   33 
[593,]   61   33 
[594,]   61   33 
[595,]   61   33 
[596,]   61   33 
[597,]   61   33 
[598,]   61   33 
[599,]   61   33 
[600,]   61   33 
[601,]   62   34

Is is possible to count the percent frequency of each distinct field in $2?

Just like this:

Corona688 · October 9, 2012, 5:01pm

awk '{ A[$2]++ } END { for(X in A) printf("%s\t%s\n", X, (A[X]*100)/NR) }' inputfile

fastlane3000 · October 9, 2012, 5:25pm

in case you need to add "%" and use float

awk '{ A[$2]++ } END { for(X in A) printf("%d\t%.2f%\n", X, (A[X]*100)/NR) }' inputfile

rdrtx1 · October 9, 2012, 5:30pm

Same solution, added format:

awk '{ A[$2]++ } END { for(X in A) printf("%s\t%5.2f%%\n", X, (A[X]*100)/NR) }' inputfile | sort -n

xshang · October 9, 2012, 5:37pm

It showed some message like this, how should I adjust the code? I'm not familiar with printf, Thank you!

awk: weird printf conversion %

 input record number 61124, file HB143-0W-A4.txt
 source line number 1
awk: not enough args in printf(%d	%.2f%
)
 input record number 61124, file HB143-0W-A4.txt
 source line number 1

rdrtx1 · October 9, 2012, 5:44pm

awk '{ A[$2]++ } END { for(X in A) printf("%d\t%.2f%%\n", X, (A[X]*100)/NR) }' inputfile

fastlane3000 · October 9, 2012, 5:45pm

maybe you're not using the same awk than i have (i've got a gnu version called gawk)
just try "rdrtx1"'s solution it's more complete with a "sort"

awk '{ A[$2]++ } END { for(X in A) printf("%s\t%3.2f%\n", X, (A[X]*100)/NR) }' inputfile | sort -n

i rather use 3 it's enough because we can't have more than 100%

alister · October 9, 2012, 5:58pm

Nit: %s makes more sense than %d since $2 (aka X) is treated as a string in every expression in which it appears.

Regards,
Alister