Accepting a phrase and counting the number of times that it is repeated in a specific website

Zakerii · December 11, 2013, 10:39am

The problem statement, all variables and given/known data:

Develop a shell script that accepts a phrase and counts the number of times that it is repeated in a specific website.

Note: Im not sure if it's the whole website, or just a specific page but im guessing its thewhole website.

Relevant commands, code, scripts, algorithms:
wget, curl, grep
The attempts at a solution (include all code and scripts):
(for now i just did the home page)

#!/bin/bash
echo "Enter a phrase:"
read phrase
echo "Number of occurences is:"
curl --websiteHere-- | grep "$phrase" | wc -l

Other Ideas:

-wget the whole humber.ca website (so it downloads every page into a .html file) and then grepping that file for the specific phrase.
-downside: this takes way too long (downloading the whole website)

-curling the humber.ca search with the phrase in it (not exactly sure how to do this all in a shell script) so it brings up the pages with the specific phrase for the site and then just grepping and word counting those pages.

Complete Name of School (University), City (State), Country, Name of Professor, and Course Number (Link to Course):
Humber College (North Campus), Toronto, Canada, Alireza M. , 160950: Game 130

Thanks

RudiC · December 11, 2013, 3:23pm

So - what's your question? Your snippet should do the job. You might want to consider grep 's c option.

bakunin · December 12, 2013, 10:07am

I am not sure if this is within the scope of your homework, but will probably have to escape several characters to make your regexp more robust.

Here is an example you can try to see the problem:

This is the inputfile.
This is a line
A line not containing the word "i s".

# search="is"
# grep -c "$search" /path/to/inputfile
2

So far, so good. Let us search for full stops. There are 2 lines with a full stop (1 and 3):

# search="."
# grep -c "$search" /path/to/inputfile
3

Obviously wrong, no? Contemplate the reason why and find out how to correct that.

I hope this helps.

bakunin