Need help in restarting process

Hi friends,

I have one unix command which is used to check the network status manually.
followig is the command

check_Network

this command give follwoing status

Network 1 is ok
    Network 2 is ok
    network 3 is ok
    network 4 is ok
    .
    .
    .
    .
    Network 10 is ok

Sometimes, command

check_Network

does not give any output and hanges at that moment. I need to press ctrl+c to come back to server command prompt
After that i restart network handler process by command

 restart ntework_all

After that command

check_Network

gives correct output as above.

Now, I want to automate this task by writting shell script.
1) Script should check netwok status ater regular interval (lets say 20 mins)
2) if command

check_Network 

dont give any output and hanged. Then it should come out from hanged state (Similler to pressing ctrl+c) and restart network handler processes by

restart ntework_all

. and check network status again
3) After restart it should send mail to me saying "Network Handler has been restarted at <Time> "

This might get you started - this version ignores any output from check_Network and only restarts if check_Network times out.

You could redirect the output of check_Network to a file and process the file at the bottom of this script checking for other conditions that require a restart.

I'd suggest running this script every 20min from cron rather that having it sleep and loop all the time.

TIMEOUT_SECS=20

command_timeout () {
   [ -d /proc/$check_pid ] || exit
   kill $check_pid
   wait $check_pid 2> /dev/null
   restart ntework_all
   echo "Network Handler has been restarted at $(date)" | mail -s "Network Handler" nakul_sh@mail.unix.com
}

check_Network > /dev/null 2>&1 &
check_pid=$!
parent_pid=$$

# Setup alarm for TIMEOUT_SECS that calls command_timeout
trap command_timeout SIGALRM
(sleep $TIMEOUT_SECS; kill -ALRM $parent_pid ) &
alarm_pid=$!

#wait for check_Network to finish
wait $check_pid 2> /dev/null

# We are back so cancel alarm
[ -d /proc/$alarm_pid ] && kill $alarm_pid
3 Likes

I run this script and seems that there are some issues with PID when it attempt to kill. It remain hanged after below output. I need to press ctrl+c to come out to command prompt.

Unix_buzz:>sh test123.sh
logout
Unix_buzz:>kill: 8182: The specified process does not exist.

But it seems that, timout factor is working fine. Because after 20 seconds,it immediately try to attempt the kill <PID>
I appreciate your help in this matter.

My guess is you pressed ctrl+c before the 20 second timeout. If 20 seconds is too long (ie your inclined to press ctrl+c earlier that than) reduce the time, perhaps 5 seconds is a better fit for this command.

No. i dont pressed ctrl+c before 20 seconds.. Seems that there are some issues with PID only.

is it possible to use any other method to get PID,instaead of using $! and $$ signs.

No. What shell are you running this in?

Of course this check_Network is a black box and could be spawning multiple other background processes, and it's one of those that's getting stuck.

When you manually clean up how do you find the PIDs ? Can you paste a ps listing of the stuck processes.

I am using ksh.

when there is problem in checking network status i get following blank output
after that i press ctrl+c on keyboard. to come-out to command prompt

Unix_buzz:>check_network




Unix_buzz:>

I mean to say, when check_network is hanged, i am unable to to anything untill i press ctrlc+c.

As previously stated, check_network is a blank box and we don't know what's inside it. Could you tell us what's inside it? Or is it a binary app?