awk help

tulf210 · May 20, 2013, 10:04am

Hi guys,

I have yet another awk question. I have a list of items:

Lemon Juice
Orange
Lemon Juice
Apple
Lemon

I need to be able to count, through an awk script the number of times a 'line' (think Lemon Juice instead of only Lemon) repeats itself in the list of items. I got as far as counting the number of times a word repeat itself with the following (thanks to Google):

{for (i=1;i<=NF;i++)
    count[$i]++
}
END {
    for (i in count)
        print count, i
}

But that wil spit out every occurance of the words in the list:

3 Lemon
2 Juice
1 Orange
1 Apple

Whereas what I would need is:

1 Lemon
2 Lemon Juice
1 Orange
1 Apple

Could anybody pinpoint where and how I could change the script to count lines of words instead of individual words? Thank you!

Scott · May 20, 2013, 10:08am

Based on your input being just like that:

$ awk 'END {for (a in A) print A[a], a} {A[$0]++}' file1
1 Orange
1 Lemon
2 Lemon Juice
1 Apple

tulf210 · May 20, 2013, 10:38am

That was a quick response! I tried it and it worked like a charm! I really need to study awk, such a powerful tool yet somehow it always confuses me.

Again, thank you a million! Cheers!

MadeInGermany · May 20, 2013, 2:09pm

In your original script

{for (i=1;i<=NF;i++)
    count[$i]++
}

loops through each field. Replace it by

{
   count[$0]++
}

and it runs once on the whole line.
While the { } section runs for each line.
The END { } section runs once at the end.
Scott put the latter first ...