While Loop Exiting

jerome_rajan · April 22, 2013, 1:22am

We are trying to design a flow so that an ETL job shouldn't start until the previous job completes. The script we have written is

while [ `ps -ef|grep "phantom DSD.RUN job_jobName"|grep -iv -e "grep" -e "SH -c"|wc -l` -ne 0  ]; do sleep 2; done

The loop however exits even when the process is actually running. Why could this be happening?

Yoda · April 22, 2013, 1:30am

If you have pgrep , I would recommend using it instead:

while pgrep -f "phantom DSD.RUN job_jobName"
do
   sleep 2
done

Note that -f is used to match pattern against process name. If you know the actual process name, you can get rid of this option and pass the actual process name as argument.

jerome_rajan · April 22, 2013, 1:33am

We do not have pgrep. How does pgrep help? Why could this be happening?

Yoda · April 22, 2013, 1:56am

The reason why pgrep is much preferred is because you don't have to worry about excluding grep command itself.

It is difficult to say without seeing the command output. So can you please post the output of below command:

ps -ef|grep "phantom DSD.RUN job_jobName"

Or you can debug your script by setting xtrace and verbose to understand what exactly is going on..

#!/bin/your_shell -xv

jerome_rajan · April 22, 2013, 3:50am

Where can I see the trace logs after running the script with the -xv tags?

---------- Post updated at 02:50 AM ---------- Previous update was at 01:52 AM ----------

The trace doesn't seem to help much. We saw the point where the loop exited. Went back to the shell and fired a ps and we could still see the same process running.

hanson44 · April 22, 2013, 3:55am

The trace often does not help that much, in my experience. Too much output, and too hard to wade through. On the other hand, it basically does show everything, if you know how to use it.

The first step is to get your script into normal appearance, instead of all glommed together on one line. The second step is to add some targeted diagnostic statements in your script.

ps combined with grep, as you are doing, will work fine, once you find the glitch.

Can you show the output of:

ps -ef | grep "phantom DSD.RUN job_jobName"

RudiC · April 22, 2013, 6:12pm

I've tried this simplified version of your command, and it works perfectly:

while ps | grep '[p]hantom DSD.RUN job_jobName'; do sleep 1; done

The grep command will give an exit status of 1 or 0 on found/not found that while can evaluate; the [p]... will find exactly p... but rule out the grep with its parameter line itself

hanson44 · April 22, 2013, 8:07pm

ps | grep '[p]hantom DSD.RUN job_jobName'

That's very clever. Is it functionally different (any chance of different output), from the following more pedestrian way you are of course well aware of?

ps | grep 'phantom DSD.RUN job_jobName' | grep -v grep

jerome_rajan · April 23, 2013, 1:19am

rudic:

I've tried this simplified version of your command, and it works perfectly:
while ps | grep '[p]hantom DSD.RUN job_jobName'; do sleep 1; done
The grep command will give an exit status of 1 or 0 on found/not found that while can evaluate; the [p]... will find exactly p... but rule out the grep with its parameter line itself

Let me give you a perspective on the issue. We have an ETL process that executes for around 1 hour. The script basically checks for the presence of this process and exits at the instant when the ETL process finishes. The problem here is that though the loop exits, we can still see the process in the list thrown by ps -eaf punched immediately after the loop exits. I doubt the issue is with the semantics of the code.

hanson44 · April 23, 2013, 1:23am

Could you show the output of:

ps -ef | grep "phantom DSD.RUN job_jobName"

jerome_rajan · April 23, 2013, 1:34am

dsadm 21299440 37224484 0 10:49:08 - 0:00 phantom DSD.RUN jobname. 0/0/1/0/0

hanson44 · April 23, 2013, 1:47am

All the previous posts used job_jobName pattern, which obviously is not going to work.

Does changing the code from using (incorrect) job_jobName to (correct) jobname solve the problem? Or is there still a problem?

jerome_rajan · April 23, 2013, 1:53am

It's the same. job_jobname is only an alias I'm using in the forum so as to not mention the actual name. Sorry for not mentioning it earlier