I am writing a shell script that will be used to automate cold backup of an Oracle Database. The database size is around 300G and will take around 5-6 hours to copy.
I have finished the script till the copy of the datafiles. Now, I am stuck at the point where I need to monitor the copy process.
I usually use while loop to monitor any long running process as below:
while true
do
if [[ $(ps -ef | grep <long running process id> | wc -l) -eq 0 ]];
then
<do something>
break
else
<do something>
fi
Now, my question is, is this the only way? Is someone using any other method to monitor a long running process?
If your copy successfull or not then let send mail..So we know about this
Let try this
#!/bin/bash
( time cp -fa /home/student/bin/.test /tmp/ & pid=$! ; wait $pid ) 2> oraclecopylog ; success=$?
myprocess=`echo ${0#./}`
while :;
do
ps -ef | grep $myprocess | grep $pid
status=$?
if [ $status -ne 0 ] ; then
mail -s "Oracle files copy is progressing.." ygemici@XX.com.tr < oraclecopylog
sleep 3600 # control on every one hour
else
break
fi
done
if [ $success -eq 0 ] ; then
mail -s "Oracle files copy is successfull" ygemici@XX.com.tr < oraclecopylog
else
mail -s "There is a problem with copy oracle files" ygemici@XX.com.tr < oraclecopylog
fi
In my opinion, this is a very poor approach, as it is very vulnerable to false positives. What if a different command with a different pid contains an argument that matches the pid being grepped for? This code will not detect that the monitored process has died/exited. What's much worse, however, is that depending on the kernel process scheduler and the shell implementation, the grep command's pid argument itself could trigger a false positive when groveling through the ps output (and given the way this code is structured, to test for equality to zero as failure, this could be happening often ... silently).
I would (perhaps melodramatically ;)) characterize this as an indiscriminate grepping of ps output. When dealing with multicolumn output whose values can collide (the arguments field, for example, can conceivably contain a username/pid/ppid/...), it is safer and easier to constrain the match to a specific column using AWK. And, if applicable, use ps' output options to limit output only to the fields that are necessary. If you are grepping for a pid, why not use a ps output format that only prints pid?
Instead of ...
if [[ $(ps -ef | grep <long running process id> | wc -l) -eq 0 ]];
I would suggest ...
if ! [ $(ps -o pid= -p <long running process id>) ]
Or, you could grep|wc the output of "ps -o pid= -p <long running process id>", if that feels more familiar
Alister and ygemici, Thanks a lot for your invaluable suggestions and tips.
My friend also suggested one solution.
Since the backup will take a minimum of 5-6 hours, instead of writing a script that has to sleep till the backup is complete, why not, schedule a cronjob that runs say 4 hours after the backup is kicked off.
This script can check if the backup is still running and exit if it is. If the backup is not running (i.e., it is finished), check whether the database is up or not and depending on that, start it up.
Seems like a workable solution instead of using the sleep command till the script complete.