smart script?

First, I know that's a bad title. I couldn't think of anything short enough. ...

I wrote the following script to let me know when various parts of the network are down. It used to look like this before last weekend when I got over 500 emails about 1 host being down all weekend:

this is in the cron to run every 5 minutes

#script to ping nodes specified in /home/scripts/watch
#supposed to email me when one does not respond.



while read HOST ; do live=`ping -c4 "$HOST"|wc -l` ; #read IP's and ping them, count the number of lines returned (should be 9 for success, 4 for failure)


if [ $live -eq 4 ] #pretty self-explanatory
then



echo "This is an automatically generated email to let you know that "$HOST" has not responded to a scheduled ping. \n\n`date`\n\n`ping -c1 "$HOST"`\n\n`traceroute "$HOST"`" | mail -s "IPwatch "$HOST" Down!" email@address.com # send a fancy email
fi
done < /home/scripts/watch # read IPs from this file

today I tried to add some smartness to it so that if it already emailed me 5 times to stop sending them every 5 minutes and instead send every hour. I added some bogus hosts into my definition file to test everything out and so far it has not worked as expected. What have I done wrong?

while read HOST ; do live=`ping -c4 "$HOST"|wc -l` ; 


if [ $live -eq 4 ] 
then

echo "ping failed" >> /home/scripts/ipwatch/$HOST
FAIL=`cat /home/scripts/ipwatch/$HOST | wc -l`
if [ $FAIL -lt 5 ]
then
echo "This is an automatically generated email to let you know that "$HOST" has not responded to a scheduled ping. \n\n`date`\n\n`ping -c1 "$HOST"`\n\n`traceroute "$HOST"`" | mail -s "IPwatch "$HOST" Down!" email@address.com 
fi
MIN=`date | awk '{print$4}' | cut -d ":" -f2,3`
while [ "$MIN" -eq "00:00" ]
do
echo "This is an hourly reminder about $HOST not responding to ping." | mail -s "IPWatch $HOST reminder" posborn@buckheadbeef.com
done
fi
done < /home/scripts/watch 

Thanks

Not a bad idea, but if you're going to do much of this thing, and it means a lot to your business, you should get something like ganglia, nagios, or zenoss. I have the most experience with nagios. Emails like this act as expected. After a while, certain events will become "Escalated" and reach you by pager, etc.

As far as the script above, it's okay, but it doesn't take into account the fact that at 12:56 the server stops, and at 13:00, you get an "hourly" reminder when you didn't get the first one. Better to create files and use the file's timestamp to determine if it's time to remind again.