If you have pgrep , I would recommend using it instead:
while pgrep -f "phantom DSD.RUN job_jobName"
do
sleep 2
done
Note that -f is used to match pattern against process name. If you know the actual process name, you can get rid of this option and pass the actual process name as argument.
Where can I see the trace logs after running the script with the -xv tags?
---------- Post updated at 02:50 AM ---------- Previous update was at 01:52 AM ----------
The trace doesn't seem to help much. We saw the point where the loop exited. Went back to the shell and fired a ps and we could still see the same process running.
The trace often does not help that much, in my experience. Too much output, and too hard to wade through. On the other hand, it basically does show everything, if you know how to use it.
The first step is to get your script into normal appearance, instead of all glommed together on one line. The second step is to add some targeted diagnostic statements in your script.
ps combined with grep, as you are doing, will work fine, once you find the glitch.
I've tried this simplified version of your command, and it works perfectly:
while ps | grep '[p]hantom DSD.RUN job_jobName'; do sleep 1; done
The grep command will give an exit status of 1 or 0 on found/not found that while can evaluate; the [p]... will find exactly p... but rule out the grep with its parameter line itself
That's very clever. Is it functionally different (any chance of different output), from the following more pedestrian way you are of course well aware of?
Let me give you a perspective on the issue. We have an ETL process that executes for around 1 hour. The script basically checks for the presence of this process and exits at the instant when the ETL process finishes. The problem here is that though the loop exits, we can still see the process in the list thrown by ps -eaf punched immediately after the loop exits. I doubt the issue is with the semantics of the code.