I have networker running on a RHEL 5.7 and over time it hangs. So the solution backup team proposed is to check if the process is hung, to stop and start it.
Unfortunately for me, the rc script only allows three commands, start, stop and status (no restart option) so I managed to set following script but when I executed it-even when networker has been stopped I get the OK message in my /var/log/messages. Why is that? Can someone please help me look into this? Where did I go wrong? Sorry I am rushing this, they need to get this setup on prod servers by today at COB...
#!/bin/bash
cmdstop='/etc/rc.d/init.d/networker stop'
cmdstart='/etc/rc.d/init.d/networker start'
if [ "${?}" != 0 ] ; then
echo "`date` CRITICAL:Networker hung, will be restarted" >>/var/log/messages
$cmdstart
else
echo "`date` OK:Networker running" >>/var/log/messages
fi
exit
It does not matter if the process hangs, the script does not check for that, it also does not restart the process and all it will do is write "`date` OK:Networker running" into the log.
---------- Post updated at 01:56 PM ---------- Previous update was at 01:39 PM ----------
Well, if we look at the screenshot I attached earlier, the process had been running since the 1st of May and it didn't look right when grep'ed because we had expected it to run and complete on the 1st itself, so that is why networker was restarted today around 11amish
I'm with you on the cron job, this idea is beginning to appeal to me more and more and I spoke to the backup chap about it so we will definitely look into implementing this on the 3 servers.
If that does not work, then perhaps there is a -f option to /usr/sbin/nsr_shutdown . Probably this is better than kill. Consult your manual and/or your Networker support organization.
#!/bin/bash
STOPCMD='service networker stop'
STARTCMD='service networker start'
PROCESS='nsrexecd'
if ps auxw | grep -v grep | grep $PROCESS > /dev/null
then
echo "`date` Process Networker is running" >>/var/log/messages
else
echo "`date` Process Networker not running and will be started" >>/var/log/messages
$STARTCMD
fi
exit