Lightwight Process monitor

We've been having some problems with a specific program in our nightly processing, so I whipped up a little script to run to monitor it, and send an e-mail when it's complete (failure or not). My primary problem is that I cannot modify the binary or the script that calls it, since the developers probably wouldn't be too happy about that. So it has to be stand-alone. My goal is a reliable script, fairly simple, and above all, very lightweight (I don't want to use any more cycles than are absolutely neccessary). Here's what I have running right now:

#!/usr/bin/sh
trap '' 1
bail_out () {
mail myself@mycompany <<!done
Subject: My_Job has finished
Importance: high
X-Priority: 1


My_Job has finished running.
If {program) reports that the process is still running, there
may be a problem. Go to \\\\path_to\samba_share and make sure that
the file is an appropriate size before continuing with the
{other_program} jobs.

Call somebody right away if there is a problem.
.
!done
exit 0
}
check_it () {
while :
do
ps -e | grep [M]y_Job >/dev/null 2>&1
case $? in
0) sleep 120 ;;
*) bail_out; exit 1 ;;
esac
done
}
check_it&

I want to be able to let other night operators run this during the weekend (that's why I wrote the instruction in the mail). Also, since none of them are very Unix literate (we only have a few Unix servers around - we're mostly NT), I wanted to make it simple to run. Just type name_of_script, and it'll background itself...

My question is: Can this be written to take even less resources?

Oh yeah, BTW, in case people are wondering:
This is a midrange HP-UX server in a key production environment...

FYI -
I also added a statement to the top:

if [ `ps -e | grep [n]ame_of_script | wc -l` -gt "1" ]; then
exit 1
fi

To keep the buggers from running a bunch of copies of the script...

Umm...there is a search function on this site. Using it, I was able to find...

How to monitor if a process is running
Finding Out When A Process Has Finished?

Oh jeez, is my face red...
Sorry 'bout that :confused:

Thanks however!

Hmm... Well, the first script works, yet I tried to create another script with a different name, that watches a different process. Each time I try to run it, I get this error:

mdwatch: /var/tmp/sh13125.1: Cannot find or open the file.

Then it hangs until I ^C...
I'm not trying to create any file expressly in the script. I also can't figure out that number... The very next command I ran was echo $$, and I got 24704 as my process number.

I also tried the set -x at the top of the script, but all it is pointing to is something in check_it.
I also tried adding a bunch of echo statements to see what's happening, but I can't see what's trying to write the file.

The best guess I have is mail trying to write a temp file. But in the case, why is one script working, and an almost exact copy isn't?

(btw, I have checked /var/tmp, and it doesn't contain any file named sh*)

It's probably something simple, but I'm too frazzled to figure it out right now...

Put the echo $$ inside the script. It's the script's pid that counts, not the login shell's pid.

That file is being created by the script itself, not the mail program. Is one script root and the other non-root? Check that /var/tmp is writable by everyone. Check that /var/tmp has free space and free inodes. Somehow that second script cannot write to /var/tmp. At least that would be my first guess.

After a little more investigation, it looks like we're missing a patch / using a broken shell. I found this info for HP-UX 10 system... We're using 11.00 though... I guess I have to check with the Unix admin.

   PHCO_16063: 
    1\) Posix shell removes heredoc temporary files 
       before they are read. When scripts like the 
       following are executed, we see messages like 
       "/tmp/sh3737.2: Cannot find or open the file." 

I tried changing the shell to /usr/bin/ksh, and got a similar error, except it said it was in the bail_out function (above)... that pretty much limits it to mail or the heredoc problem...

I'm going to look into this some more, and try to figure out why one script works well, but another similar one doesn't (on a VERY consistant basis)...

(BTW, I can create files in /tmp and /var/tmp, there is plently of space / inodes...)

Well, instead of trying to figure it all out right now, I decided to just remove the functions with the heredocs... I just placed the mail inside of another file and dumped it into the mail command...

Man, I really liked having all in one script though... oh well... I'll post back if I can figure out why the heredocs weren't working in just the one script...