Count same word which has come many times in single lines & pars

heman96 · October 13, 2013, 3:02am

can i get a simple script for , Count same word which has come many times in single lines & pars

Eg file would be ==

"

Thanks heman thanks thanks
Thanks heman
thanks man

"

So resullt should be

Thanks = 5
heman=2
man = 1

thanks in advance

---------- Post updated at 02:01 AM ---------- Previous update was at 01:48 AM ----------

nm, the below worked ..

/root #tr ' ' '\12' <test | tr 'A-Z' 'a-z' | sed s/[^a-zA-Z]//g | sort | uniq -c | sort -nr
    334
     76 a
     69 volume
     48 the
     39 lvm
     38 disk
     36 vxvm
     36 group
     29 to
     26 logical
     19 volname
     19 devvolgrp
     17 volumes
     16 in
     15 or
     14 diskgroup
     13 remove
     13 devvolgrplvolname
...

---------- Post updated at 02:02 AM ---------- Previous update was at 02:01 AM ----------

I will use code tag !!!

tx

Scrutinizer · October 13, 2013, 3:15am

It would start out simple, something like:

awk '{for(i=1; i<=NF; i++) A[tolower($i)]++} END{for(i in A) print i " = "  A}'  file

But then it would get more complicated. If you only use the lowercase form then what do you do with a word like "I" or what do you do with names that start with an uppercase letter, or for instance with "Mr. d'Arcy". You'd want to exclude dots and comma's that cling to words. Is the "-" part of a word or isn't it. Is a number a word, or a combination of a word and a number? Etc. You could make an approximation, for example:

awk -F'[^-_[:alnum:]]*' '{for(i=1; i<=NF; i++) if ($i!="") A[tolower($i)]++} END{for(i in A) print i " = "  A}' file

But to cater for all corner cases is likely going to be a bit complex..