frequency count using shell

Hello everyone,
please consider the following lines of a matrix

[574,]   59   32 
[575,]   59   32 
[576,]   59   32 
[577,]   59   32 
[578,]   59   32 
[579,]   59   32 
[580,]   59   32 
[581,]   60   32 
[582,]   60   33 
[583,]   60   33 
[584,]   60   33 
[585,]   60   33 
[586,]   60   33 
[587,]   60   33 
[588,]   60   33 
[589,]   60   33 
[590,]   60   33 
[591,]   61   33 
[592,]   61   33 
[593,]   61   33 
[594,]   61   33 
[595,]   61   33 
[596,]   61   33 
[597,]   61   33 
[598,]   61   33 
[599,]   61   33 
[600,]   61   33 
[601,]   62   34 

Is is possible to count the percent frequency of each distinct field in $2?

Just like this:

59  25.00%
60  35.70%
61  35.70%
62  3.57%
awk '{ A[$2]++ } END { for(X in A) printf("%s\t%s\n", X, (A[X]*100)/NR) }' inputfile
1 Like

in case you need to add "%" and use float

awk '{ A[$2]++ } END { for(X in A) printf("%d\t%.2f%\n", X, (A[X]*100)/NR) }' inputfile
1 Like

Same solution, added format:

awk '{ A[$2]++ } END { for(X in A) printf("%s\t%5.2f%%\n", X, (A[X]*100)/NR) }' inputfile | sort -n
1 Like

It showed some message like this, how should I adjust the code? I'm not familiar with printf, Thank you!

awk: weird printf conversion %

 input record number 61124, file HB143-0W-A4.txt
 source line number 1
awk: not enough args in printf(%d	%.2f%
)
 input record number 61124, file HB143-0W-A4.txt
 source line number 1
awk '{ A[$2]++ } END { for(X in A) printf("%d\t%.2f%%\n", X, (A[X]*100)/NR) }' inputfile
1 Like

maybe you're not using the same awk than i have (i've got a gnu version called gawk)
just try "rdrtx1"'s solution it's more complete with a "sort"

awk '{ A[$2]++ } END { for(X in A) printf("%s\t%3.2f%\n", X, (A[X]*100)/NR) }' inputfile | sort -n

i rather use 3 it's enough because we can't have more than 100%

1 Like

Nit: %s makes more sense than %d since $2 (aka X) is treated as a string in every expression in which it appears.

Regards,
Alister

1 Like