printing only wanted rows in awk

carl_r · December 12, 2007, 4:12pm

Hi!
The fallowing awk script counts words from input file, then sorts these words to decreasing order of occurrences and also to alphabetical order. And then prints all these words out with the number of their occurrence. For example:

and 7
for 4
make 4
you 4
awk 1
....

Problem is that if the text file includes thousands of words then the output is also very long. And I'm only interested of first 10 most occurred word, which means that I'd like to print out only first 10 rows. I have tried to change the printf command to print only first 10 sorted rows, but i have had no success:( Is it even possible to achieve this goal by only changing the printf command? Should i try something else?

script:

 \{
     $0 = tolower\($0\)    
     gsub\(/[^[:alnum:]_[:blank:]]/, "", $0\)
     for \(i = 1; i &lt;= NF; i\+\+\)
         freq[$i]\+\+
 \}

END {
sort = "sort -k 2nr"
for (word in freq)
printf "%s\t%d\n", word, freq[word] | sort
close(sort)
}

Thanks in advance!

porter · December 12, 2007, 4:20pm

Have you considered "head"?

man head

vgersh99 · December 12, 2007, 4:22pm

sort = "sort -k 2nr | head -10"

But why are you sorting inside awk?
Would not it be better to 'post=process' the manimulated data AFTER?

nawk -f myAWKscriptWithOUTsorting.awk my dataFile | sort -k 2nr | head -10

vgersh99 · December 12, 2007, 4:23pm

nicely put, porter!
[sorry, I could not resist!]