Monitor a long running process

sunpraveen · May 7, 2010, 2:03am

Gurus,

I am writing a shell script that will be used to automate cold backup of an Oracle Database. The database size is around 300G and will take around 5-6 hours to copy.

I have finished the script till the copy of the datafiles. Now, I am stuck at the point where I need to monitor the copy process.

I usually use while loop to monitor any long running process as below:

while true
do
if [[ $(ps -ef | grep <long running process id> | wc -l) -eq 0 ]];
then
<do something>
break
else
<do something>
fi

Now, my question is, is this the only way? Is someone using any other method to monitor a long running process?

Do share your thoughts.

Regards,

Praveen

ygemici · May 7, 2010, 7:33am

If your copy successfull or not then let send mail..So we know about this
Let try this

#!/bin/bash
( time cp -fa /home/student/bin/.test /tmp/ & pid=$! ; wait $pid ) 2> oraclecopylog ; success=$?
myprocess=`echo ${0#./}`
 
while :; 
   do
      ps -ef | grep $myprocess | grep $pid
      status=$?
           if [ $status -ne 0 ] ; then
              mail -s "Oracle files copy is progressing.." ygemici@XX.com.tr < oraclecopylog
              sleep 3600 # control on every one hour  
                     else
              break
           fi
  done
if [ $success -eq 0 ] ; then
        mail -s "Oracle files copy is successfull" ygemici@XX.com.tr < oraclecopylog
                  else
        mail -s "There is a problem with copy oracle files" ygemici@XX.com.tr < oraclecopylog
fi

alister · May 7, 2010, 11:15am

sunpraveen:

I usually use while loop to monitor any long running process as below:
while true
do
if [[ $(ps -ef | grep <long running process id> | wc -l) -eq 0 ]];
then
<do something>
break
else
<do something>
fi
 

In my opinion, this is a very poor approach, as it is very vulnerable to false positives. What if a different command with a different pid contains an argument that matches the pid being grepped for? This code will not detect that the monitored process has died/exited. What's much worse, however, is that depending on the kernel process scheduler and the shell implementation, the grep command's pid argument itself could trigger a false positive when groveling through the ps output (and given the way this code is structured, to test for equality to zero as failure, this could be happening often ... silently).

I would (perhaps melodramatically ;)) characterize this as an indiscriminate grepping of ps output. When dealing with multicolumn output whose values can collide (the arguments field, for example, can conceivably contain a username/pid/ppid/...), it is safer and easier to constrain the match to a specific column using AWK. And, if applicable, use ps' output options to limit output only to the fields that are necessary. If you are grepping for a pid, why not use a ps output format that only prints pid?

Instead of ...

if [[ $(ps -ef | grep <long running process id> | wc -l) -eq 0 ]];

I would suggest ...

if ! [ $(ps -o pid= -p <long running process id>) ]

Or, you could grep|wc the output of "ps -o pid= -p <long running process id>", if that feels more familiar

Regards,
Alister

sunpraveen · May 11, 2010, 2:55am

Alister and ygemici, Thanks a lot for your invaluable suggestions and tips.

My friend also suggested one solution.

Since the backup will take a minimum of 5-6 hours, instead of writing a script that has to sleep till the backup is complete, why not, schedule a cronjob that runs say 4 hours after the backup is kicked off.

This script can check if the backup is still running and exit if it is. If the backup is not running (i.e., it is finished), check whether the database is up or not and depending on that, start it up.

Seems like a workable solution instead of using the sleep command till the script complete.

What say?

Regards,

Praveen

ygemici · May 11, 2010, 2:46pm

we know work times of oracle cp process or automatic in random times or related anything ?