Monitoring an html web page changes

Hello,

I need to monitor an html web page for ANY changes and should be able to know if it's modified or not (since last query). I do not need what modifications but just notification is enough.

This is a simple web page and I don't need to parse the links any further.

Is it possible to do it using a shell script, if yes please advise me how to do it.

Thanks!

---------- Post updated at 11:07 AM ---------- Previous update was at 10:24 AM ----------

I got it done by using wget and keep downloading html files and comparing with last previously downloaded file. This way works but I just want to know if there's any better way.

Thanks

some thing like
first:

wget http://domain.com/path/to/page.html
md5 page.html > previous_md5
rm page.html

then run script (from cron)

#!/bin/sh
wget http://domain.com/path/to/page.html
md5 page.html > last_md5
diff previous_md5 last_md5
if [ "$?" = "!" ] ; then 
      mail -s "page.html changed on `date`" your@mail.addr
fi
mv last_md5 previous_md5
rm page.html

wget is NOT working at all because sometimes the downloaded HTML file size is getting different (few bytes) even though no changes in the web page.

It's weird and I don't think we can rely on wget for this.

Any suggestions would highly be appreciated.

Thanks!

Try using lwp-download

lwp-download "http://your_URL_here.com" download.html

What operating systems must be supported? Some systems have efficient notification interfaces which do not require polling. Upon notification of file modification, an email can be sent.

An example of a tool which leverages such an api: inotifywait(1) - Linux man page

Regards,
Alister

if file size is getting different - file changed.
wget doesn't change downloaded file by itself

if page some times differ (i.e. it has dinamic content) - you must find enother way to monitoring changes. not get page over web.

do you have access to http server or page source (svn/filesystem/other)?
do you need monitoring differing whole page or it's part?

Inotifywait is to monitor file changes on LOCAL files systems.

both wget and lwp-download NOT working consistently (they show different sizes of html files even though there were no changes).

Could anyone please suggest any better solution - thanks much in advance!!

I've used wget for years reliably. If the file size is different, something probably has changed.

It may not be anything you care about, but a change is a change.

Compare the old and new files with diff and see what you find different.

This may not be related, but I see a bug was identified with wget causing it to report file size inaccurately - Red Hat Bugzilla

They used version-release number: wget-1.9.1-16 to reproduce this. You might have to check your version and try an upgrade if necessary.

Two ways.

  1. Take the response header of the Webpage and look for the last modified time
  2. If you are specific on what contents to be checked in the Web page[say for example a div part], download and cut that in a txt file and do a checksum with md5 or cksum to compare the change

You can also try
WatchThatPage - Monitor web pages extract new information
It will track any change for you and send you an email when change occurs