what causes this script wait for a while/grep command in linux whereas in solaris it is fine.
To give further details, i suppose to call a (8) background process, for each of line read from the file. So i introduced a wait command and when all complete i read next line and so on.
Since wait was really waiting for grep and not for spawned child, my script broke.
Othercases when it can run fine in linux
fewer lines in config file
in one test, i shrink one of row length from 260 to 250, it worked.
i can't predict a consitance condition for this behaviour.
Any help is appreciated in fixing / explaining this behaviour.
I regret that I can't decipher what you are actually trying to acheive, however my bet is that from earlier testing you may have a background task still hanging around and the wait command is watching for all background processes to finish (jobs 1 & 2 running unknown command). Try starting a new session to ensure that is not your problem.
my apologizes for not using tag . and thanks for tagging.
in both machines, i use ksh my program starts #!/bin/ksh , so it should override any shell is being used.
to answer the scenario , Robin's guess is right.
here is what i am doing
while read table, server_list
do
for server in server_list
do
do_somthing with table and server & # in background
done
wait # wait for all my background jobs to finish.
echo now i can go and read next line
done
now in my cash, wait started waiting for a zombie (unknown command, evetually a grep , sort) and never come out to read next line.
when i was trying to simplify the problem, i removed inner for loop , so my question looks awkward(stupid?)
technically, when i have while loop and have a wait command it shouldn't wait. ofcourse linux does something in background for while loops
i even used a intermediate file for doing sort and grep output and pass it on to while loop, but wait continue to hang.
again, when i started my config file with few lines, it was working fine in linux too. when i started adding more and more lines, it gave error
at one point, when i reduced row length from 260 char to 250 char , it worked too. but script still can work with line length exceeding more than 250 char when fewer lines were there. so i can't say this is the only scenario problem is exhibited.
i don't know if i am using pdksh and realksh (i never heard pdksh earlier)
anyway, here is my output
/bin/ksh --version
version sh (AT&T Labs Research) 1993-12-28 n+
I recall on one O/S the ksh would run ksh scripts in the same pid, so you would see interactive background with wait, but in Solaris it is always a child process, and waits only for its own children. I ended up putting () around lines 2-$ to keep my environment from getting scrambled by my scripts.
I guess wait on LINUX waits for everything. I wonder if nohup helps to move the script away. It might be an interesting man page read or such, to find out whether it is waiting for all processes on the tty or on the process group. But yes, collecting pids and waiting for them one at a time is best, as you get the exit return $? of each child from "wait $child_pid".
If the exit status is not a biggie, or you check that through log files, you can skill the wait and monitor the children through shared stdout and stderr, like this:
This monitors not only the children but their children and so on, as long as they do not redirect both stdout and stderr. Even when "wait $child_pid" returns, the child may have antecedents still running, background or up-pipeline processes that close stdout but do not immediately exit, or someone down-pipeline exits cutting them off! $! is just the parent or last in pipeline pid.
sleep 99 | sleep 5 & wait $! # wait waits for sleep 5 but sleep 99 is still running.
(sleep 99 & sleep 5 ) & wait $! # wait waits for sleep 5 but sleep 99 is still running.
The ability of processes other than $! to get errors not reported on $? is one reason to rely on logs, or write a very attentive wrapper script to keep an eye on the children and report $? for all. Sometimes I get really formal, for money and my job security and all that. This is fine for interactive, but not so wise unattended:
cmd1|cmd2|cmd3
>$fail_log
(
cmd1
zret=$?
if [ $zret != 0 ]
then
echo cmd1 returned $zret >>$fail_log
fi
) | (
cmd2
zret=$?
if [ $zret != 0 ]
then
echo cmd2 returned $zret >>$fail_log
fi
) | (
cmd3 . . . .
)
if [ -s $fail_log ]
then
exit 1
fi
This might be a bug in the version of ksh that you have which is likely fixed in the current release.
I'm running 'Version JM 93t+ 2009-02-02' on some boxes, and 'Version JM 93t+ 2010-06-21' on most of my linux boxes. Testing on the older of the two it handled your script without any problems:
>>>start: Sun Oct 31 23:18:46 EDT 2010
jobs output
[2] + Running <command unknown>
[1] - Running <command unknown>
jobs -p
Before entering parallel process 18154
>>>finish: Sun Oct 31 23:18:46 EDT 2010
I added start/finish messages to show the delay, if there was any. There is a more recent release than the 6/21/2010 version; it can be pulled direcly from AT&T Labs-Research; AST software download
As a further test, I put this little script together that reads lines with one or more sleep times and sets that many async sleep processes going. It is similar to the script you are running and it seems to have no issues with a more recent version of ksh.
while read list
do
for x in $list
do
echo "$(date) sleeping $x"
sleep $x &
done
echo "$(date) waiting"
wait
echo "$(date) looping"
done <xx
Sometimes I script parallel processing logging so it looks like it was done sequentially, both to keep from mixing line fragments and so as not to confuse the onlookers in their less sophisticated moments (-: