Hello All,
I am using Linux. I have two scripts:
- inner_script.ksh
- main_wrapper_calling_inner.ksh
Below is the code snippet of the main_wrapper_calling_inner.ksh:
#!/bin/ksh
ppids=() ---> Main array for process ids.
fppids=() ---> array to capture failed process ids.
pcnt=0 ---> success count
fpcnt=0 ---> fail count
echo ""
start_time=`date '+%Y/%m/%d:%H:%M:%S'`
for file in `cat ${CONFIG_DIR}/abc.txt`
do
table_name=`echo ${file}`
nohup ksh ${BIN_DIR}/inner_script.ksh ${table_name}>${LOG_DIR}/${table_name}_inner_script_${curr_date}.log &
ppids+=($!)
echo "Process ID: $!."
echo ""
echo "Log File for ${table_name} is: ${LOG_DIR}/${table_name}_inner_script_${curr_date}.log"
done
echo ""
echo ""
echo "Starting Checking the process completion of all tables:"
export tot_table_cnt=`wc -l ${CONFIG_DIR}/abc.txt|awk '{ print $1 }'`
echo ""
echo "Total Number of tables: ${tot_table_cnt}."
while [ ${pcnt} -lt ${tot_table_cnt} ]; do
ptmp=()
for p in ${ppids[@]}
do
if [[ ! -d /proc/${p} ]]; then
wait ${p}
sts=$?
if [[ $sts -eq 0 || $sts -eq 127 ]]; then
echo "Process completed with Process ID ${p}; exit code: $sts; at `date '+%Y/%m/%d:%H:%M:%S'`"
pcnt=`expr $pcnt + 1`
else
echo "Process failed for Process ID: ${p}"
index=`echo ${ppids[@]/$p//}|cut -d/ -f1 |wc -w |tr -d ' '`
unset ppids[$index]
pcnt=`expr $pcnt + 1`
fpcnt=`expr $fpcnt + 1`
fppids+=(${p})
fi
else
zombie_lst=$(ps axo pid=,stat= | awk '$2~/^Z/ { print $1 }'|grep "$p")
if [[ -z ${zombie_lst} ]]; then
ptmp+=(${p})
else
wait ${p}
sts=$?
if [[ $sts -eq 0 || $sts -eq 127 ]]; then
echo "Process completed with Process ID ${p}; exit code: $sts; at `date '+%Y/%m/%d:%H:%M:%S'`"
pcnt=`expr $pcnt + 1`
elif [[ $sts -ne 0 || $sts -ne 127 ]]; then
echo "Process failed for Process ID: ${p}"
index=`echo ${ppids[@]/$p//}|cut -d/ -f1 |wc -w |tr -d ' '`
unset ppids[$index]
pcnt=`expr $pcnt + 1`
fpcnt=`expr $fpcnt + 1`
fppids+=(${p})
else
kill -TERM ${p}
index=`echo ${ppids[@]/$p//}|cut -d/ -f1 |wc -w |tr -d ' '`
unset ppids[$index]
pcnt=`expr $pcnt + 1`
fi
fi
fi
done
ppids=(${ptmp[@]})
done
if [[ $pcnt -eq ${tot_table_cnt} ]]; then
echo "process for all tables is complete for ${curr_date}."
if [[ $fpcnt -eq 0 ]]; then
echo ""
echo "process is successfully completed for all Tables."
echo "DONE file is touched."
touch ${TEMP_DIR}/inner_script_completion.done
echo ""
else
echo ""
echo "process failed for ${fpcnt} tables."
echo "Failed Process IDs are ${fppids[@]}."
echo "DONE File is not touched in ${TEMP_DIR} path. Need to verify or re-run the process manually."
fi
fi
Config File abc.txt has newline separated values like:
a
b
c
d
Below is the code snippet for inner_script.ksh:
#!/bin/ksh
nohup hive -S -e "do something;" &
pid=$!
wait $pid
status=$?
if [[ $status -eq 0 ]]; then
echo "Success"
exit 0
else
echo "failure"
exit 1
fi
Scenario:
I am trying to execute the inner_script in parallel for each of the value in config file abc.txt and to track the completion of each child process. I want the entire execution time to be the maximum execution time of any child process.
Problem:
- Parent is unable to track the successful completion of some of the child process. Atleast one child process becomes Zombie or defunct.
. - I am using
zombie_lst=$(ps axo pid=,stat= | awk '$2~/^Z/ { print $1 }'|grep "$p")
to identify if a child has become zombie and then trying to WAIT on that. Does WAIT works with a Zombie process?
. - At the end...I am doing a
KILL -TERM ${p}
if the child has become a Zombie. Is thisKILL -TERM ${p}
actually killing the entire process?
Kindly suggest.