Help scripting to start, check, and restart processes

Here it goes from my unexperienced point of view. I am using CentOS 5.6. I have a Java based server that needs to be running 24/7/365. To begin from the machine the server is on rebooting; I SSH in to a shell, cd to the server dir, screen -S server1, and execute ./exec (listed below) in the screen. This runs the Java server and restarts the it's process if/when stopped.

exec

#!/bin/bash
while true; do
java (*all variables*)
wait
done

I would like to automate this starting process with a script using a crontab job. This script(s) would need to do what I do manually to start the server, perform checks to make sure the 3 processes are always running (screen, exec, java) and be able to restart them in their appropriate environment (command line or screen).

I have tried the psybnc psybncchk script with some adjustments to try and do some of this, but my knowledge of scripting is very minimal.

cbchk

CBPATH=/home/*user*/*dir*

if test -r $CBPATH/cb.pid; then
    CBPID=$(cat $CBPATH/cb.pid)
    if $(kill -CHLD $CBPID >/dev/null 2>&1)
    then
	exit 0
    fi
fi
cd $CBPATH
SCREEN -S server1
./exec

When I kill all of the processes, then run the above script, the shell screen gets spammed with arrows and the server's starting info displays inside the never ending arrows. When I attach to the screen, Java is continually stating errors nonstop. I've also tried adding " & echo $! > cb.pid" after the ./exec to automate getting the PID, but that makes the screen unusable for some reason.

While exec is running, it's keeping the Java server running, but when all of the processes are killed, I'm not sure how to script the starting sequence and have it make sure everything stays running. Any and all help will be greatly appreciated. Thank you.

Edit: Edited for clarity, hopefully.

And this has to stay inside of a screen session because it's interactive, or just a way to view the output?

running your "exec" script in the background is not sufficient?

shell$ ./exec &

Then you want a crontab to make sure 'exec' is running?

1 Like

The screen is interactive and the only way to view the output.

The exec script covers keeping the Java server running only if it doesn't consume all available memory and the OS doesn't kill all of the processes.

The exec script is executed in the screen to run, view, and interact with the server with ./exec .

When the machine the server is on is rebooted, I would like a script to cd to the server dir, start a screen with screen -S server1, then execute the exec script within the screen with ./exec . Then have the script check to make sure all of the processes are running. If not, restart them.

Sorry for not being able to explain everything very precise or easily.

Having screen as part of the equation at first seemed difficult for me. Then I read the manual...

Firstly, we create the crontab-able script which checks if the screen session is started.

#!/bin/sh
# chkscreen: checks if a screen session is running.
SESSION="server1"
DAEMON="screen -d -m -S $SESSION /home/mute/test/java-daemon.sh"

# does the session exist?
screen -r $SESSION -ls -q 2>&1 >/dev/null

if [ $? -le 10 ]; then
        echo "Restarting $DAEMON"
        $DAEMON
fi

Then java-daemon.sh is your while looping deal, which hopefully doesn't die since it'd restart your java faster than waiting for cron job.

#!/bin/sh
# java-daemon.sh: keeps restarting a buggy server... ;)

while true; do
./java
done

So then you'll place chkscreen inside of /etc/init.d/rc.local to start with the server, and inside of crontab.

mute@geek:~/test$ crontab - <<__EOF__
> */15 * * * * /home/mute/test/chkscreen
> __EOF__

Guaranteed to work on my machine!

Sorry I'm no guru myself but unless someone else replies this is what you're stuck with :wink:

1 Like

Thank you very much neutronscott for your help, this has worked like a charm. Looks like I was way off. Thanks again. :slight_smile:

There's just many different ways it can be solved. Instead of keeping a PID file (my original trial) I just ask screen if the session is running. It makes it so elegant, eh? :slight_smile:

Thanks for the Thanks! :b:

---------- Post updated 06-23-11 at 09:50 AM ---------- Previous update was 06-22-11 at 02:13 PM ----------

Oh man, I couldn't sleep last night because I was thinking I didn't test all cases. So first thing today I run 'chkscreen' while attached to 'server1' session and I was right. bug :frowning:

Sorry. Using '-r' checks if there is a session you can attach to, which is false if you're already attached.

So change

# does the session exist?
screen -r $SESSION -ls -q 2>&1 >/dev/null

to read

# does the session exist?
screen -S $SESSION -ls -q 2>&1 >/dev/null

And it shouldn't start extra servers when you're viewing the screen anymore. Sorry.

1 Like

In my case, the -r only produces 1 additional chkscreen and java-daemon.sh process, and only 1 Java server process. When I changed the -r to -S, it kept making chkscreen and java-daemon.sh processes and no additional Java server processes. That's why I was trying to get and use the PIDs, using the second script in the OP.

If Java uses all of the machine's memory, then the screen will be terminated. I'll no longer be attached to it, so the check for being or not being attached may not be the approach needed. I'll try and use what you've kindly given me and integrate something to get and check the PIDs.

No need to be sorry and lose sleep over this. It'll get worked out eventually and I'll post the final tweaked results then. Below is what I'm using. Thanks a million for all of your help neutronscott.

chkscr

#!/bin/sh
# chkscr: checks if a screen session is running.

SESSION="server1"
DAEMON="screen -d -m -S $SESSION /home/*server*/*local*/jd.sh"

# does the session exist?
screen -r $SESSION -ls -q 2>&1 >/dev/null

if [ $? -le 10 ]; then
        echo "Restarting $DAEMON"
        $DAEMON
fi

jd.sh

#!/bin/sh
# jd.sh: restarts a stopped server.

while true; do
java *server variables*
wait
done

Well I didn't actually lose sleep but thought of it. As I said, and tested, using -r started new screens each time chkscreen is ran if I was attached, because it checks for attachable screens and it is not tagged multiattach (maybe yours is in .screenrc?) But -S would just check if the session is running. I see no flaw in such method but maybe someone elses opinion would help.

But if you'd like to check PID, I can do that too. Screen would exit when jd.sh exits, so we'd have jd.sh report its PID with an echo $$ > $PIDFILE

I can write it up later. Mostly when I need my own service script like this, I'd copy one from a simple service in /etc/init.d. I am not famaliar with CentOS's as much, as I use Debian.

1 Like

Seems I'm on to something, but can't get the correct PID. The PID that is saved is of the process that executes everything and not the running command's PID.

chkscr

#!/bin/sh
# chkscr: checks if the server is running.

CBPATH=/home/*user*/*serverdir*
SESSION="server1"
DAEMON="screen -d -m -S $SESSION /home/*user*/*serverdir*/exec"

if test -r $CBPATH/cb.pid; then
    CBPID=$(cat $CBPATH/cb.pid)
    if $(kill -CHLD $CBPID >/dev/null 2>&1)
    then
	exit 0
    fi
fi

echo "Restarting Server"
$DAEMON & echo $! > cb.pid

exec

#!/bin/bash

while true; do
java *server variables*
wait
done

When I type
# pidof /bin/bash /home/*user*/*serverdir*/exec
I'm given 2 PIDs. The first is for -bash and the second is the correct one. Maybe this command is returning the parent and child PIDs? I've been searching how to get a command's PID, but with no luck. When I type # ps x, I'm shown this:

PID...TTY...STAT...TIME...COMMAND
8658 pts/2 Ss+... 0:00 .. /bin/bash /home/*user*/*serverdir*/exec

This is the command's PID I need saved to cb.pid at the bottom of the chkscr script. :o

screen will spawn a child, so I was thinking using $$ from 'exec'

You'll be checking if jd.sh is running. If it's running, java must be.. If it's not running, screen would exit (unless you add other windows I guess).

Remove the echo from chkscr and use

echo $$ >$CBPATH/cb.pid

at top of 'exec'

1 Like

Finally got it with your help neutronscott. Thank you for all of your help. :smiley:

I removed the " & echo $! > cb.pid" from the bottom of the chkscr script and added "ps -aef | grep -v grep | grep '/bin/bash /home/*user*/*serverdir*/exec' | awk '{print $2}' > cb.pid" to the next line.

Now it runs perfectly.

---------- Post updated at 04:38 PM ---------- Previous update was at 03:25 PM ----------

It's not working correctly. Spoke too soon, because my test didn't run long enough. It keeps making chkscr and exec processes. The check in the chkscr script is invalid.

I'm now looking for a check that checks the value in the file cb.pid to the actual PID of the process.

I can get the actual value of the processes PID by using "CBPID=$(ps -aef | grep -v grep | grep '/bin/bash /home/*user*/*serverdir*/exec' | awk '{print $2}')".

I can't find how to extract the PID number stored in the cb.pid file and make a valid "if actual PID = file's PID then end" statement.

Any help would be greatly appreciated.

I thought using screen -ls was more elegant. This is what I came up with:


#!/bin/sh
# chkscreen: checks if a screen session is running.
WORKINGDIR=/home/mute/test
PIDFILE=cb.pid
SESSION="server1"
DAEMON="/home/mute/test/java-daemon.sh"

is_running()
{
    pid=$1
    name=$2
    [ -z "$pid" ] && return 1
    [ ! -d /proc/$pid ] &&  return 1
    cmd=`cat /proc/$pid/cmdline | tr "\000" "\n"|tail -n 1 |cut -d : -f 1`
    [ "$cmd" != "$name" ] &&  return 1
    return 0
}

# does the session exist?
#screen -S $SESSION -ls -q 2>&1 >/dev/null

# check java-daemon.sh's pid instead?
pid=$(cat $PIDFILE 2>/dev/null)
is_running "$pid" "$DAEMON"

if [ $? != 0 ]; then
        echo "Restarting $DAEMON"
        screen -d -m -S $SESSION $DAEMON
fi
#!/bin/sh
# java-daemon.sh: keeps restarting a buggy server... ;)

WORKINGDIR=/home/mute/test
PIDFILE=cb.pid

cd $WORKINGDIR
echo $$ >$PIDFILE

while true; do
top
done

or a more simple way without checking that it is the correct command running (it's not often the PIDs wrap around and get used for something else...?)

# check java-daemon.sh's pid instead?
pid=$(cat $PIDFILE 2>/dev/null)
kill -0 $pid 2>/dev/null

if [ "$?" != 0 ]; then
        echo "Restarting $DAEMON"
        screen -d -m -S $SESSION $DAEMON
fi
1 Like

After extensive testing, it's finally all worked out.

Both files are saved in the server's directory and a crontab job running every minute with * * * * * /home/user/serverdir/chkscr >/dev/null 2>&1
here's the finished scripts.

chkscr

#!/bin/bash
# chkscr: Checks if server is running, if not, restarts it

# screen name
SESSION="server1"

# screen session and exec script
DAEMON="screen -d -m -S $SESSION /home/user/serverdir/exec"

# Gets exec's PID
PROPID=$(ps -aef | grep -v grep | grep '/bin/bash /home/user/serverdir/exec' | awk '{print $2}')

# Get PID value from file
read val < cb.pid

# Checks if exec's PID and file PID are same, sends message, then exits
if [ "$PROPID" -eq "$val" ]; then
screen -S $SESSION -X exec .! echo 'message Test complete.'
exit 0
fi

# If above check fails, deletes PID file
rm cb.pid

# Sends message, restarts server, remakes PID file with exec's PID for above check
echo "Restarting the server now."
$DAEMON
ps -aef | grep -v grep | grep '/bin/bash /home/user/serverdir/exec' | awk '{print $2}' > cb.pid

exec

#!/bin/bash
# exec: Starts the server from inside the screen and restarts it, if stopped

while true; do
java server variables
wait
done

Thanks neutronscott for all of your help and suggestions.