I have a bash script that has been running (on SUSE 9.3) dozens of times over the past couple of years without error. Recently it has been hitting intermittent �cp: cannot stat FILE: No such file or directory� errors.
The script has nested loops that continuously process files in a directory until the end condition is met. There are usually between 50 and 2000 files processed per execution. About 60% of the files processed hit a condition that requires the file to be copied to a new name so the new file can be processed in a future iteration. After the file is processed it is moved to a processed directory.
The error has not occurred more than one time per execution. Most times I can just re-run the script with the same data set and it works fine.
The strange thing is, according to the output log, the file it is complaining about did exist in the source directory and was moved to the processed directory. Another strange thing is the location of the �cannot stat� error in the output log seems to be random. Sometimes it appears in the middle of the output for the next file or even four or five files later.
Below is a distilled version of the code.
Function_A() { local ID=$1
local SUB=$2
if [ $CURRENT_COUNT -lt $MAX_COUNT ]
then nohup ChildProcess $ID $SUB > "$LOG" 2>&1 &
let CURRENT_COUNT++
RC=0
elseecho "Cannot start another process right now"
RC=1
fi
return $RC
} # End Function_A
##########################
# Main
#
# some unrelated detail here...
#
cd $DIR
WaitingForChildrenToFinish=1
while [ $WaitingForChildrenToFinish -eq 1 ]
do # Process each file
for FILE in `ls -tr ${MGR_ID}_*.msg 2>/dev/null`
doecho -e "\nProcessing file: \"$FILE\""
Function_A $MGR_ID $SUBJECT
RC=$?
if [ $RC -ne 0 ]
then RETRY="$DIR/${MGR_ID}_${RETRY}_${SUBJECT}.msg"
cp -p $FILE $RETRY
fi
# Finished processing this file so move it out
mv $FILE $PROCESSED_DIR
#
# some unrelated detail here...
#
# End condition
if [ $CO_END_FLAG -eq 1 ]
then WaitingForChildrenToFinish=0
break
fi
done # End for each file
if [ $WaitingForChildrenToFinish -eq 1 ]
then echo "No files to process so sleep..."
sleep 5
fi
done # End WaitingForChildrenToFinish
Sample output looks like this:
�.
Processing file: "00wm4793_AAA_111.msg"
Cannot start another process right now
Processing file: "00wm4793_AAA_112.msg"
cp: cannot stat `00wm4793_AAA_111.msg': No such file or directory
Cannot start another process right now
�
Any help will be appreciated. Thanks.