[Solved] Unable to mailx new $pid for a script restart

Ill try to make this brief:

I am trying to get the script below to run another script defined as BATNAM.
The script runs fine, does what i designed it to do, however...

I would like it to mailx the NEW $pid that was restarted.

This script is supposed to go in crontab as root, and run by the min checking if $pid exists.

Lastly, should i try a diff approach, maybe a "while" or "until"? rather than "if/then/else"

I have tried sleep 10, as it does take more than 7 seconds for the script to show up in ps -ef|grep myusername.

could you pls review the code below, and make suggestions.
This is a simple syntax issue and I am unable to find it.

I am on hpux / /usr/bin/sh, and I have root if necessary.

thanks!

#!/usr/bin/sh -x
# set the locals
stty intr '^c'

# set the vars
BATDIR="/usr/script8/batch"             # batch dir
BATNAM="bat_fstsi61c.sh"                # batch process file
BATPF="bat_fstsi61.pf"                  # batch to grep
BATSVC="SIGTEST"                        # mail topic
BATPF="bat_fstsi61.pf"                  # batch to grep
SERVER="PHANTOM"                        # servername here
SDIR="/apps/sigmon/dvl/fst61"           # sigmon dir
SIN="$SDIR/in"                          # in dir
SPROC="$SDIR/proc"                      # proc dir
SLOG="/usr/script8/batch/LOGS"          # log dir
SMAIL="/usr/script8/batch/EMAILS"       # email log dir
SMAILER="petey"                         # person(s) to email

# export the vars
export BATDIR BATNAM BATSVC BATPF SERVER SDIR SIN \
SPROC SLOG SMAIL SMAILER

pid=`ps -ef|grep "$BATPF" |grep -v grep |awk -F" " '{print $2}'`
 echo $pid

if [ "$pid" = "" ]
 then
   /usr/bin/sh $BATDIR/$BATNAM
   sleep 10
   mailx -s "${SERVER} ${BATNAM} restarted" petey
   sleep 10
   echo "$pid" > $SLOG/"$BATNAM-restart-on--`date +%F-%T`"
 else
   echo "service is ok"
   pid=""
fi

The section (if block) where you call mailx is on the condition that $pid is zero length.
So you cannot send an "empty" pid and be able to read it. So, help me here. What are you trying to do?

thanks for the response, i appreciate it.

welp the first thing is, i check if the other batch is running,

if it is NOT running $pid="" (empty) which means there is no pid running with that grep, THEN, restart the process, then exit.

If it is running, the current or new pid started moments ago with above NOT statement, THEN, simply do nothing, echo "running", no log needed.

If it is NOT running, restart, grab the NEW $pid and email that new pid with mailx, AND echo the restarted to the log file.

everything is a go, perms wise, this script restarts the dead script. but it hangs, i want it to ALSO exit out with zero exit status and recheck via crontab every 1 min. this is a HA server, and needs constant babying lol. it doesn't cry much, but when it does, people lose jobs.

thanks for your help again, i appreciate it.
anymore input, just ask :wink:

So, does:

/usr/bin/sh $BATDIR/$BATNAM

just restart the process asynchronously and exit, or is it supposed to run forever? Does the trace of the script produced by sh -x show that the two sleeps, mailx, date, and echo are run when the batch is not running? Should:

   /usr/bin/sh $BATDIR/$BATNAM
   sleep 10
   mailx -s "${SERVER} ${BATNAM} restarted" petey
   sleep 10
   echo "$pid" > $SLOG/"$BATNAM-restart-on--`date +%F-%T`"

be changed to:

   /usr/bin/sh $BATDIR/$BATNAM&
   pid=$!
   mailx -s "${SERVER} ${BATNAM} restarted" petey
   echo "$pid" > $SLOG/"$BATNAM-restart-on--`date +%F-%T`"

Why does your script initialize BATPF twice?

Why bother setting pid to an empty string just before exiting when the service is running?

i copied this script from elsewhere, i am learning. i am glad you gave some input. thanks. to answer your questions to help:

  1. Why does your script initialize BATPF twice?
    it doesn't, typo, it will be removed. thanks for noticing that DON!

  2. yes it does start a process - another script, a batch script actually that is suppose to run and accept requests to a client driven db - we won't get into that.

  3. Why bother setting pid to an empty string just before exiting when the service is running?
    I didn't, I do not know what I am doing. I'm trying though.

The sleep is there in the hopes that it will produce a new $pid, that is one thing that is puzzling me the most.

Like I said before, the script runs fine ( the one this is calling ) and needs to be online 24/7. This script is supposed to CHECK to see if ANY pid matches the grep in BATPF. it takes 10 seconds for a new pid to be reproduced. I am trying to wait/sleep/capture that new pid in the "if".

Is there a better way?

Don I will try to do the steps you suggested. Looks like it will work. But remember, I need this script to end. so running it in background process with & looks great, but will take 10 seconds for that new pid to come avail? If I start the backend script, it takes a few mins to produce a pid(the backend is a db starting with a client) and usually takes same amount of time for pid via ps and grep to show its not there - aka gracefully shutdown that connection. DB is heavy transaction based.

thanks, hope that helps. pls ask anything more. thanks guys...

You're welcome.

If you want to fix the problem, we need to get into that. The changes I suggested should work if the script starts the batch script and waits for it to complete. If you would have shown us the trace output produced by /usr/bin/sh -x, we might be able to give you a definitive answer; without seeing that output or seeing what is in /usr/script8/batch/bat_fstsi61c.sh, we can only make wilde guesses (like I did before).

If /usr/script8/batch/bat_fstsi61c.sh is the batch script and it doesn't return until it is killed, what I suggested should work. If /usr/script8/batch/bat_fstsi61c.sh asynchronously starts the batch script and returns without waiting for it to complete, what I suggested will not work. In that case you'll need to rerun the ps pipeline to reset pid after the batch process is restarted. You wait 20 seconds after /usr/script8/batch/bat_fstsi61c.sh returns (if it returns) before saving $pid in your log file, but you haven't reset pid so you know it has to be an empty string whether or not the batch script restarted successfully.

There are two scripts. You are talking as though there is only one. Until you understand that there are two and how they interact, we're lost. If you actually mean that the script above is the contents of /usr/script8/batch/bat_fstsi61c.sh, then your description of what is going on is extremely confusing and could be rewritten in a much simpler fashion.

If the script isn't ending when it restarts the batch process, my guess still fits the data you're seeing. But until you show us the trace output you got from running this script or show us the contents of /usr/script8/batch/bat_fstsi61c.sh, we are just guessing.

opps, double post, removed... for obvious reasons

---------- Post updated at 09:55 AM ---------- Previous update was at 09:47 AM ----------

well said don, thanks. i like your style.
here is a rewrite:

I have a script that will not show the current pid.

the script below uses a simple if statement.

the if statement is suppose to restart another process (it does!)

once the 2 script is restarted ( it stopped, this script restarts it)
it produces a pid.

it did NOT have a pid, thus needed to be restarted.
This script below does that.

problem: i can not grep/grab/whatever you want to call it, and mailx that new pid.

that's my problem: can NOT grab the new pid.

I am at work now, i will try your suggestion:
pid=$!

sh -x stdout BEFORE i kill pid to simulate it's not running

$ sh -x sig-ps-restart7.sh
+ stty intr ^c
+ BATDIR=/usr/script8/batch
+ BATNAM=bat_fstsi61c.sh
+ BATPF=bat_fstsi61.pf
+ BATSVC=SIGTEST
+ BATPF=bat_fstsi61.pf
+ SERVER=PHANTOM
+ SDIR=/apps/sigmon/dvl/fst61
+ SIN=/apps/sigmon/dvl/fst61/in
+ SPROC=/apps/sigmon/dvl/fst61/proc
+ SLOG=/usr/script8/batch/LOGS
+ SMAIL=/usr/script8/batch/EMAILS
+ SMAILER=petey
+ export BATDIR BATNAM BATSVC BATPF SERVER SDIR SIN SPROC SLOG SMAIL SMAILER
+ + ps -ef
+ grep bat_fstsi61.pf
+ grep -v grep
+ awk -F  {print $2}
pid=20599
+ echo 20599
20599
+ [ 20599 =  ]
+ echo service is ok
service is ok
+ pid=

*** sh -x stdout AFTER i kill pid to simulate it's not running ***

first i kill the current pid that is already running prior:

kill 20599
$ sh -x sig-ps-restart7.sh
+ stty intr ^c
+ BATDIR=/usr/script8/batch
+ BATNAM=bat_fstsi61c.sh
+ BATPF=bat_fstsi61.pf
+ BATSVC=SIGTEST
+ BATPF=bat_fstsi61.pf
+ SERVER=PHANTOM
+ SDIR=/apps/sigmon/dvl/fst61
+ SIN=/apps/sigmon/dvl/fst61/in
+ SPROC=/apps/sigmon/dvl/fst61/proc
+ SLOG=/usr/script8/batch/LOGS
+ SMAIL=/usr/script8/batch/EMAILS
+ SMAILER=petey
+ export BATDIR BATNAM BATSVC BATPF SERVER SDIR SIN SPROC SLOG SMAIL SMAILER
+ + ps -ef
+ grep bat_fstsi61.pf
+ grep -v grep
+ awk -F  {print $2}
pid=
+ echo

+ [  =  ]
+ cd /usr/script8/batch
+ pid=3951
+ mailx -s 'PHANTOM' 'bat_fstsi61c.sh' PID '3951' was started petey
+ . bat_fstsi61c.sh
trunc'd......
+ exec /apps/dlc/bin/our-data-base-here -b -p /tmp/job003954 -pf /usr/script8/pf_files/bat_fstsi61.pf

it appears to work for the pid part thanks!!!

BUT... it now needs to close: here is the ps -ef|grep petey

petey  3952  3944  0 09:29:49 pts/tt    0:00 mailx -s 'PHANTOM' 'bat_fstsi61c.sh' PID '3951' was started petey
  petey  3944 29979  0 09:29:49 pts/tt    0:00 sh -x sig-ps-restart7.sh
  petey 24556 24555  0  Jun  4  pts/tr    0:00 -sh
  petey  3951  3944  0 09:29:49 pts/tt    0:02 /apps/dlc/bin/our-data-base-here -b -p /tmp/job003954 -pf /usr/script8/pf_files/bat_fstsi61.pf

so now, this job will be put into a cron job, IF i we can make the mailx actually finish.. it seems to be sitting there ...

sorry had to leave other script for security reasons.. hope you understand...

OK. From the trace, it looks like we're close. It looks like your current invocation of mailx is something like:

   mailx -s "${SERVER} ${BATNAM} PID ${pid}" petey

and the problem you have is that mailx reads data from its standard input that will become the body of the message you're sending. In this case it is waiting for input instead of sending your message with an empty body (with the entire message being contained in the Subject field). To fix that, change the line to:

   mailx -s "$SERVER $BATNAM PID $pid" petey < /dev/null

In addition to adding the redirection (shown in red above), I also removed unneeded braces in the subject field. (The braces won't hurt anything, but they aren't needed in these three cases.) Because the mailx is hanging, the echo of the new PID to a new log file is not being run either. Adding the redirection to mailx will enable your script to create that new log file.

Unless you see some other problem, it looks like the only thing you haven't done yet is to change the else side of your if statement from:

   echo "service is ok"
   pid=""

to:

   echo "service is ok"

As I said before, setting pid to an empty string here doesn't hurt anything, but serves no purpose whatsoever (other than to make people who might read your script wonder why it is there).

thanks, i will try new suggestions, makes sense, i will implement as soon as i go in to work. thanks! ill let you know the results.

---------- Post updated 06-10-13 at 10:52 AM ---------- Previous update was 06-09-13 at 11:51 PM ----------

/dev/null redirection does not work...

also this script will be run as root, the backend db requires a user name that checks for proper creds to access it. I will fix that when the time comes to do crontab as root. However, what is happening is fine now. see below

script that works now:

# set the locals
#stty intr '^c'

# set the vars
BATDIR="/usr/script8/batch"             # batch dir
BATNAM="bat_fstsi61c.sh"                # batch process file
BATPF="bat_fstsi61.pf"                  # batch to grep
PFDIR="/usr/script8/pf_files"           # pf dir
SERVER="PHANTOM"                        # servername here
SLOG="/usr/script8/batch/LOGS"          # log dir
SMAILER="petey"                         # person(s) to email

# export the vars
export BATDIR BATNAM BATPF PFDIR SERVER SLOG SMAILER

pid=`ps -ef|grep "$BATPF" |grep -v grep |awk -F" " '{print $2}'`
 echo $pid

if [ "$pid" = "" ]
 then
  cd $BATDIR
  ./$BATNAM &
    $pid=$! # 2>/dev/null
      echo "$SERVER $BATNAM was restarted $pid=$! " \
      | mailx -s "$SERVER $BATNAM was restarted NEW PID $pid=$! " $SMAILER
       echo "NEW PID STARTED $pid=$! for $BATNAM on $SERVER" > "${SLOG}/${BATNAM}-restart-on--`date +%F-%T`"
exit 0
else
  echo "$pid=$!"
  echo "service is ok"
fi

sh -x results below:

$ sh -x fst-sig-restart.sh.final
+ BATDIR=/usr/script8/batch
+ BATNAM=bat_fstsi61c.sh
+ BATPF=bat_fstsi61.pf
+ PFDIR=/usr/script8/pf_files
+ SERVER=PHANTOM
+ SLOG=/usr/script8/batch/LOGS
+ SMAILER=petey
+ export BATDIR BATNAM BATPF PFDIR SERVER SLOG SMAILER
+ + ps -ef
+ awk -F  {print $2}
+ grep -v grep
+ grep bat_fstsi61.pf
pid=
+ echo

+ [  =  ]
+ cd /usr/script8/batch
+ ./bat_fstsi61c.sh
+ =21818
fst-sig-restart.sh.final[38]: =21818:  not found.
+ mailx -s PHANTOM bat_fstsi61c.sh was restarted NEW PID =21818  petey
+ echo PHANTOM bat_fstsi61c.sh was restarted =21818
+ echo NEW PID STARTED =21818 for bat_fstsi61c.sh on PHANTOM
+ + date +%F-%T
1> /usr/script8/batch/LOGS/bat_fstsi61c.sh-restart-on--2013-06-10-10:35:56
+ exit 0

mail sent and file created for log.

my issue is now the batch script this script runs.
the error is invalid user/pw blah blah. i'll fix that.

thanks for the suggestions guys, really, that helped and i learned a few new things.
THANKS!!!

you may close this one.

pete
ps ill be back :wink: i like your guys teaching style.