we've had nagios spewing false alarm (for the umpteenth time) and finally the customer had enough so they're starting to question nagios. we had the check interval increased from 5 minutes to 2 minutes, but that's just temporary solution. I'm thinking of implementing a script on the affected client servers that monitors the processes, and if its not running it will use Logger to produce a string of sentence that I can easily pick up with nagios monitoring. all i have to do then is just add the "key words" into nagios check configuration which already monitors /var/log/messages
So the script that looks for process to check if its running or not is as such:
# more /usr/local/bin/process
#!/bin/bash
PROCESS="httpd"
if /etc/init.d/$PROCESS status | grep running.> /dev/null
then
logger -s "$PROCESS is running"
else
logger -s "$PROCESS is not running"
fi
so how do i monitor to see if the script itself is running? I tried with ps -ef but you know ps is still going to show me two lines (one with ps, the other if script is running) so its not really a good way to check, isnt it?
when i put the job in the background with nohup and grep with ps aux I get nothing
I see your script is checking if the process is running, using logger to log the status and terminating. This is the reason why you are not seeing it running. You should put this in an infinite while loop with a sleep time of your choice for periodically checking and logging the status.
E.g:-
#!/bin/bash
PROCESS="httpd"
while (true)
do
if /etc/init.d/$PROCESS status | grep running.> /dev/null
then
logger -s "$PROCESS is running"
else
logger -s "$PROCESS is not running"
fi
sleep 300
done
Now you can start your script using nohup and open another terminal to check if it is running:-
if i use flock i can see messages in my /var/log but how do i ensure that it only gets written into if a. the script is running b. the script is not running c. the process is running d. the process is not running
#!/bin/bash
SERVICE="httpd"
CHECK_SCRIPT="/usr/local/bin/process"
if flock -n /var/run/your.lockfile -c /usr/local/bin/process >/dev/null
then
logger -s "$CHECK_SCRIPT is running. Do nothing"
else
logger -s "$CHECK_SCRIPT is not running. Please restart script"
fi
if pgrep -l $SERVICE | grep $SERVICE >/dev/null
then
logger -s "$SERVICE process is running. Do nothing"
else
logger -s "$SERVICE process is not running. Please restart service"
fi
now, this script only detects the script is running and if the process is running/not running...a little help please to make it reliase the script is not running
OMG bipin you're totally right. hang on let me try this again.
so heres whats happening.
If i run the script after starting httpd i get this:
Nov 4 23:56:06 hedkandi root: /usr/local/bin/test is not running. Please restart script--this isnt right, it should be showing /usr/local/bin/test is running. Do nothing
Nov 4 23:56:06 hedkandi root: httpd process is not running. Please restart service
If i run the script after stopping httpd I get this:
Nov 4 23:56:06 hedkandi root: /usr/local/bin/test is running. Do nothing
Nov 4 23:56:06 hedkandi root: httpd process is not running. Please restart service
and the other two thing thats totally not working is if
i stop script and process is running it should give me a
/usr/local/bin/test is not running. Please restart script
httpd process is running. Do nothing
if i stop script and stop process it should give me a
/usr/local/bin/test is not running. Please restart script
httpd process is not running. Please restart service
Please modify your if-else statement like below and re-try:-
if [ `ps -eaf | grep -v grep | grep -c $CHECK_SCRIPT` -ne 0 ]
then
logger -s "$CHECK_SCRIPT is running. Do nothing"
else
logger -s "$CHECK_SCRIPT is not running. Please restart script"
fi
if [ `pgrep -l $SERVICE | grep -c $SERVICE` -ne 0 ]
then
logger -s "$SERVICE process is running. Do nothing"
else
logger -s "$SERVICE process is not running. Please restart service"
fi
just in case if anyone needs a script that checks if itself is running/not running as well as the process its meant to check if its running/not running
#!/bin/bash
SERVICE="httpd"
CHECK_SCRIPT="/usr/local/bin/test"
if [ `ps -eaf | grep -v grep | grep -c $CHECK_SCRIPT` -ne 0 ]
then
logger -s "$CHECK_SCRIPT is running. Do nothing"
else
logger -s "$CHECK_SCRIPT is not running. Please restart script"
fi
if [ `pgrep -l $SERVICE | grep -c $SERVICE` -ne 0 ]
then
logger -s "$SERVICE process is running. Do nothing"
else
logger -s "$SERVICE process is not running. Please restart service"
fi
i have this in crontab and runs at the interval of 5 mins