This is a good first cut, but there are a couple of problems here:
- If there are enough files in the directory, the expansion of
*
may overflow ARG_MAX limits on your system.
- The list returned by find will not be sorted by timestamp, so there is no guarantee that the last file processed by this script will be the newest file. If it isn't, the next time you run the script some files will be processed again.
I think the following script will get around those problems:
#!/bin/ksh
lastfile="$HOME/lastfilename.txt"
if [ -f "$lastfile" ]
then read -r newest < "$lastfile"
else newest=""
fi
ls -rt|(
if [ -n "$newest" ]
then # lastfile was not empty. Skip over files older than the file
# named in lastfile.
while read -r file
do if [ "$file" = "$newest" ]
then break
fi
done
fi
# Process all files newer than the one previously listed in last file
# (or all files in the directory if lastfile didn't exist or was empty).
while read -r file
do # Process newer files in order from oldest to newest...
ETL_PROCESS.sh "$file"
# The script should abort here if ETL_PROCESS.sh failed...
# Record the last file processed.
printf "%s\n" "$file" > "$lastfile"
done
)
But, if someone edits the last file processed in this directory after more files are added, this script (and the original script) will ignore the new files added after the last time the script ran until the time the file was edited. If that is a concern, the following may be a safer approach:
#!/bin/ksh
processed="$HOME/processed.txt"
# If the list of already processed files does not exist, create an empty list.
if [ ! -f "$processed" ]
then touch "$processed"
fi
ls -rt | grep -vF -f "$processed" | while read -r file
# Process all files newer that haven't already been processed...
do # Process newer files in order from oldest to newest...
ETL_PROCESS.sh "$file"
# This script should skip the next step if ETL_PROCESS.sh failed.
# Add current file to the list of processed files.
printf "%s\n" "$file" >> "$processed"
done
It keeps a list of files processed and skips any file in that list when the script is run again later. It doesn't care about timestamps other than the fact that it will hand ETL_PROCESS.sh unprocessed files in order from the oldest to the newest.
Note, however, that this script can fail if a filename in the directory containing files to be processed can contain a file name that is a substring of another file's name. You haven't given us any indication of how files are named, so if this is a concern the grep command in the pipeline in this script would have to be adjusted to account for the actual filenames you'll be using. And, of course, the list of processed files should be edited to remove old files when they are removed from the directory.
Assuming that ETL_PROCESS.sh provides some indication that it successfully processed a file, all of these scripts should verify that a file was processed successfully before continuing with later files. The first two scripts should exit and not process any newer files until the problem is fixed or some files may never be processed. The last script above only needs to avoid adding the failed file to the list of processed files (unless ETL_PROCESS.sh has to process input files in the order in which they were received).
Both of these scripts were written and tested using ksh, but there is nothing here that is ksh specific as long as you're using a shell that recognizes basic POSIX shell syntax requirements (such as bash and ksh).
Hope this helps...