Count the number of occurence of perticular word from file

rinku · May 25, 2007, 7:25am

I want to count the number of occurence of perticular word from one text file.

Please tell me "less" command is work in ksh or not. If it is not working then instead of that which command will work.

blowtorch · May 25, 2007, 7:36am

The 'less' command is used to view a file. Use grep to search for a particular word in a file. You can use this to count the number of occurrences too, just check the man page for the exact switch.

funksen · May 25, 2007, 9:36am

grep -o <string> <file> | wc -w

cfajohnson · May 25, 2007, 9:04pm

tr -cs 'A-Za-z' '\n' < FILE | grep -c "WORD"

pelipeplips · August 8, 2007, 5:20am

I tried using grep with the "-o" option and it gives me this error:
grep: illegal option -- o
What does "-o" option do?

I also need to find an occurence of a certain string within a file. Currently I'm using:

grep -c 'abc' sample.txt

But the code above only counts the occurrences per line. How will i get the total count of the 'abc' words regardless of how many occerence they have per line?

Example:
This is line 1 abc and abc
This is line 2 abc

Klashxx · August 8, 2007, 6:53am

Use:

awk '{ 
     for (i=1;i<=NF;i++)
         if ( $i == "abc")
         c++
     }
END{
print c}' sample.txt

Or:

awk '
BEGIN {
RS=FS
}
{
if ( $0 ~ /abc/ )
   c++
}
END{
print c++
}' lsample.txt

Shell_Life · August 8, 2007, 11:09am

This solution does not work.

Here is a sample file:

a aa aaa
aaa aa a
aaa aa a aaa aa a aaa

Here is one test:

tr -cs 'A-Za-z' '\n' < FILE | grep -c "aaa"

It gives the total of words as '3', when the answer is '5'.

Here is another possible solution for those who want to use shell script:

#!/bin/ksh
typeset -i mCnt=0
mWord='aaa'
for mEach in `cat input_file`
do
  if [ "${mEach}" = "${mWord}" ]; then
    mCnt=${mCnt}+1
  fi
done
echo 'Total words for '${mWord}' = '${mCnt}

matrixmadhan · August 8, 2007, 11:50am

Here,

print statement should have been

print ++c

matrixmadhan · August 8, 2007, 11:58am

there is one more problem with that

with the input posted by Shell and for search pattern as "aa" it would match "aaa" also

try this,

awk ' BEGIN {RS=FS} { if ( $0 ~ /^aa$/ ) { c++; i=NR; } } END{if ( i == NR ) { c++ } print c}' filename

kahuna · August 8, 2007, 12:21pm

I'm not sure this is working

$echo "aa \naa" |awk ' BEGIN {RS=FS} { if ( $0 ~ /^aa$/ ) { c++; i=NR; } } END{if ( i == NR ) { c++ } print c}' 
1

Klashxx · August 8, 2007, 1:07pm

typo here -->should be:

print c

Obviously, you have to delimit your pattern to get expected result.

Cheers.

kahuna · August 8, 2007, 1:32pm

How would you delimit? I tried

$echo 'abc abc' |awk '
BEGIN {
RS=FS
}
{
if ( $0 ~ /^abc$/ )
   c++
}
END{
print c
}'
1

matrixmadhan · August 8, 2007, 1:37pm

This is absolutely working as expected.

Input is
aa<space>
aa

only the pattern in second line "aa" matches and not the one in the first line.

matrixmadhan · August 8, 2007, 1:41pm

Changing to print c doesn't help.

It should be changed to the below as I had posted earlier .

awk ' BEGIN {RS=FS} { if ( $0 ~ /^aa$/ ) { c++; i=NR; } } END{if ( i == NR ) { c++ } print c}' filename

kahuna · August 8, 2007, 2:12pm

I'm not sure if your response is serious or joking. The OP was looking for a word count. Clearly there are 2 occurrances of aa, but the code counts one. Assuming you are serious, try

echo "aa\naa"

I get a blank line.

Klashxx · August 8, 2007, 2:30pm

This is a conceptual problem , check this:

$ printf "abc abc sasa abc\nabc sasa abc" |awk '
BEGIN {
RS=FS
print "<<<"FS"<<<<<"
}
{
print "<<<"$0"<<<<<"
if ( $0 ~ /abc/ )
   c++
}
END{
print c
}'

<<<abc<<<<<
<<<abc<<<<<
<<<sasa<<<<<
<<<abc
abc<<<<<
<<<sasa<<<<<
<<<abc<<<<<

As you see we changed the RS to FS (blank space) so the pattern can�t match record number 4 , it contains a carriage return.

Hope this help.

Regards.

kahuna · August 8, 2007, 2:45pm

Thanks. From what I can see, Shell_Life has the only viable posted solution.

I need to add that I think Klashxx's first awk solution works too.

matrixmadhan · August 8, 2007, 2:46pm

What is the need for me to joke here ?

Words you have used really have taken me aback.

Ok. coming to the point with your example.

try this first and then the later solution

echo "aa\naa" | awk ' BEGIN {RS=FS} { print $0, length, NR } '

as per your argument the output should be something like

aa 2 1
aa 2 2

but the actual output is

aa<newline>
aa<newline>
 6 1

so the input string that is matched is "aa<newline>aa<newline>"

and not 'aa' and 'aa' individually

eventually awk will not match the above pattern with 'aa'
and hence there is no effective result.

Hope this clears !

kahuna · August 8, 2007, 3:03pm

Thank you. I understand the issue. My point is that the code fails to solve the original problem. It does not count the number of occurrences of the given string. Your posting almost makes it sound like the code is right, so the original problem must be wrong.

ghostdog74 · August 8, 2007, 9:29pm

See here for example