Run the script continously but mail once in 1 hour

Hi,
I have a script written for monitoring the queue manager status continously. below is the script.

QMGR=`dspmq | awk '{print $1}' | cut -f2 -d "(" |  cut -f1 -d ")"`
QMSTATUS=`dspmq | awk '{print $2}' | cut -f2 -d "(" |  cut -f1 -d ")"`
count=`dspmq | awk '{print $1}' | cut -f2 -d "(" |  cut -f1 -d ")"|wc -l`
while [ $count -ne 0 ]
do
e=`echo $QMGR | cut -f$count -d" "`
f=`echo $QMSTATUS | cut -f$count -d" "`
if [ $f != "Running" ]
then
echo "$e is not running" >> /var/mqm/ILhousekeeping/QMdown.txt
fi
count=`expr $count - 1`
done
mail -s "Queue Manager down - `hostname` - $e" anusha1234@gmail.com < /var/mqm/ILhousekeeping/QMdown.txt

The requirement is, this script has to run continuously (Configured in Cron) to monitor the queue manager status, but the mail has to be triggered only once in one hour if the Queue manager is down.

Thanks,
Anusha

1 Like

Don't run that script "continuously"; it will be an enormous waste of resources at no benefit. Think about what is an adequate reaction time (one hour looks good according to your mail frequency) and schedule accordingly. It doesn't seem like you gain anything in running it more often.

Hi Rudic,
The problem is if I configure the script to run every one hour, in between time if the queue manager is down I will not be notified, which can cause issue.

The script has to run continuously, if the queue manager is down it has to send mail immediately and next mail notification has to be sent after an hour (In case if the Queue manager is still down)

Thanks,
Anusha

Anusha,

RudiC is correct, continuously checking if the queue manager is down would waste resources for little benefit. And email is not a good choice (imho) for "instant" notification - it is neither deterministic nor reliable as it requires someone to notice that a message has been delivered to an email client. Or to notice that a message has not been delivered.

What O/S and version are you running on?

And if this is a critical production system, don't you have some sort of system monitoring application - tivoli? nagios? - that could be used?

Hi,

Below is the Linux version which we are using. We have Non production environment support, so no monitoring tools are available as it is newly formed team.

uname -r
2.6.18-308.11.1.el5xen

Thanks,
Anusha

Anusha,

The reason I asked for the O/S is that Solaris-1x and RHEL-7.x have mechanisms to restart services. But your system is running RHEL-5.8.

You need to determine how quickly you need to respond to a dead queue manager. There is no such thing as instantly. For non-production, every ten to fifteen minutes would be reasonable.

As far as checking, why not just check for everything is not running:

HOST=$(uname -n)
FILE=$(mktemp)
trap "rm -f ${FILE}" EXIT
dspmq | grep -v Running > ${FILE}
if [[ -s "${FILE}" ]]; then
  mailx -s "${HOST}: queue manager is down" "${@}" < ${FILE}
fi

As far as re-alerting in an hour ... why? As RudiC stated, if you need to be renotified every hour that a queue manager is down, then you only need to check once an hour. Remember, there is no guarantee that the first notification will be noticed. I think a more robust (and simpler) method would be to check every 10 minutes and notify any time a queue manager is down.