Fail Parent Script if any of the child fails

gvkumar25 · June 10, 2017, 1:41am

I have requirement where I need to fail parent if any one of the child process fails. Here is the code snippet


for i in 1 2 3 4 5 6 7 8 9 10
do
child_script $i &
done

wait

I need to fail my main script if any one of my child process fails

RudiC · June 10, 2017, 1:47am

wait allows to specify a process ID or a job specification to wait for. man bash :

gvkumar25 · June 10, 2017, 2:28am

one Idea came into my mind is

pass the parent process id to child and kill parent in child process if child process fails.
correct me if I am doing it in wrong way.

#!/bin/ksh
P_PID=$$
for i in 1 2 3 4 5 6 7 8 9 10
do
child_script $i $P_PID &
done
wait

#!/bin/ksh
some code of child
ret_stat=$?
if [ "$ret_stat" -ne "0" ]
then
kill $2
fi

---------- Post updated at 01:28 AM ---------- Previous update was at 01:27 AM ----------

thanks rudic for quick reply

can you please explain me with example.

RudiC · June 10, 2017, 3:51am

With file1 .. file4 existing in PWD, I ran

for i in 1 2 3 4 5; do { sleep $i; ls file$i; } & done; jobs; for i in 1 2 3 4 5; do wait %$i; echo $?; done
[1] 7865
[2] 7866
[3] 7867
[4] 7869
[5] 7870
[1]   Running                 { sleep $i; ls file$i; } &
[2]   Running                 { sleep $i; ls file$i; } &
[3]   Running                 { sleep $i; ls file$i; } &
[4]-  Running                 { sleep $i; ls file$i; } &
[5]+  Running                 { sleep $i; ls file$i; } &
file1
[1]   Done                    { sleep $i; ls file$i; }
0
file2
[2]   Done                    { sleep $i; ls file$i; }
0
file3
[3]   Done                    { sleep $i; ls file$i; }
0
file4
[4]-  Done                    { sleep $i; ls file$i; }
0
ls: cannot access 'file5': No such file or directory
[5]+  Exit 2                  { sleep $i; ls file$i; }
2

gvkumar25 · June 10, 2017, 4:11am

Thanks for your reply.
I don't want to exit the script if none of child process fails. I want to exit only if my child process fails.
I tried below code snippet more like your solution only.

files="test test2 "
pids=""
exit_counter=0
for i in $files; do
    ./test2 $i &
    p_pid="$!"
    echo "$i  process id is : $p_pid"
    pids+="$p_pid "
done
jobs -p
for pid in $pids; do
    wait $pid
    if [ $? -eq 0 ]; then
        echo "SUCCESS - Job $pid exited with a status of $?"
    else
        echo "FAILED - Job $pid exited with a status of $?"
        exit_counter=`expr "$exit_counter" + 1 `
        echo "EXIT COUNTER : $exit_counter"
    fi
done
if [ $exit_counter -ne 0 ]
then
  exit "$exit_counter"
fi

jgt · June 10, 2017, 8:39am

I see a weakness if the child process fails without terminating.
Examples:
user input required
runaway process.
I would consider replacing the wait process with a loop that executes every 30 (?) seconds to see if all processes have finished, and if not within a reasonable time frame, perform some notification.

RudiC · June 10, 2017, 1:23pm

@jgt: How then would you retrieve the children's exit stati?

Don_Cragun · June 10, 2017, 2:49pm

Note also that your FAILED notice in:

for pid in $pids; do
    wait $pid
    if [ $? -eq 0 ]; then
        echo "SUCCESS - Job $pid exited with a status of $?"
    else
        echo "FAILED - Job $pid exited with a status of $?"
        exit_counter=`expr "$exit_counter" + 1 `
        echo "EXIT COUNTER : $exit_counter"
    fi
done

is attributing the exit status of the test command to the job on which you're reporting instead of the exit status of the job. You probably want something more like:

for pid in $pids; do
    wait $pid
    status=$?
    if [ $status -eq 0 ]; then
        echo "SUCCESS - Job $pid exited with a status of 0"
    else
        echo "FAILED - Job $pid exited with a status of $status"
        exit_counter=`expr "$exit_counter" + 1 `
        echo "EXIT COUNTER : $exit_counter"
    fi
done

but, of course, that doesn't get around the problem of wait $pid hanging forever if the job it is waiting for is hung waiting for input or has entered an infinite loop.

jgt · June 11, 2017, 7:47pm

echo "begin" >test1                                                      
elapsed_time=0                                                           
while [ -s test1 ]                                                       
do                                                                       
        cat /dev/null >test1                                             
        ps -ef |grep $pid1 >test1                                        
        ps -ef |grep $pid2 >>test1                                       
        #repeat for each process                                         
        if [ -s test1 ]                                                  
        then                                                             
                sleep 30                                                 
                let elapsed_time=elapsed_time+30                         
                if [ elapsed_time -gt 300 ]                              
                then                                                     
                        echo "all processes should have finished by now" 
                        cat test1
                        exit 1                                           
                fi                                                       
        fi  
done