Parsing through list of files

I have a requirement where I need parse through files in a directory which have a specific pattern and then check whether the file has been processed or not. The exit condition is any file that has been processed will have an entry in database. If it is processed i.e., if an entry is present for the file in database then I need to skip it and move towards the next file in the order of ascending dates

Please suggest a script for the same

Could you be a little more cryptic?

What operating system are you using?

What shell are you using?

What database are you using? How do you access your database? What keys are used in your database? What field in your database contains the name of the file? Is it a filename or a pathname?

How is the age of a file to be determined? (Last modification timestamp? A timestamp encoded in the name of the file? Something else?)

What have you tried to solved this?

AIX
ksh
Oracle
Access using sqlplus within script
Database field contains file pattern not file name and in another column it contains absolute path

age of file to be determined by modification timestamp

Currently my script has a logic where it aborts with exit code 1 if it finds more than one file matching the pattern

  #!/bin/ksh
#set -x


# Required arguments
# ==================
# 1 - oracle user account
# 2 - oracle user account password
# 3 - oracle database instance name
# 4 - Source System Name.
# 5 - Datastage Master Sequence Name
# 6 - Time to wait between each iteration.
# 7 - Maximum Wait time.
# 8 - Project Name
# 9 - Batch ID
# 10 - 'IN/OUT'
# 11 - STATUS
# 12 - SCRIPT PATH

IFS=''


runSQLPlus() {

        sqlplus -S -L ${1}/${2}@${3} @${4} > ${5}
        CmdStatus=$?
        if [ $CmdStatus != 0 ]; then
                echo "SQLPLus command was unsuccessful. Please review ${4} and ${5}"
                echo " Remember to issue rm ${4} and rm ${5} to remove"
                echo " these files once the error has been diagnosed"
                exit 1
        fi
}


currDateReg=`date  +"%Y-%m-%d %H:%M:%S"`

if [ -n  "${12}" ]; then
        echo
else
        echo "Incorrect number of Arguments specified"
        exit 1
fi


# Initialize Variables
oracleUser=${1}
oraclePass=${2}
oracleDB=${3}
sourceSystemName=$4
launchSequence=$5
Waittime=$6
MaxWaittime=$7
projName=$8
IN_OUT_BOUND=${10}
status=${11}
SCRIPT_PATH=${12}
Nwaits=`expr $MaxWaittime / $Waittime `
fileSize=''
dateRecd=''
batchId=${9} #`/opt/ibm/working/OCI/appl/sh/restart/genBatchId.ksh $oracleUser $oraclePass $oracleDB`
recdLocn=''
fileOwner=''
fileGroup=''
filePerms=''
registeredFlag=0

# SQL file for retrieving the file name from File Master table . Note that the control file pattern is difficult to use and is wrapped in
# quotes as follows. If this is not done the variables containing the pattern will resolve to actual file names.
#
# SELECT CONCAT(CONCAT('"',FILE_LOCATION),CONCAT(CONTROL_FILE_PATTERN,'"')) FROM FILE_MASTER WHERE SOURCE_SYSTEM ='FIN' and LAUNCH_SEQUENCE ='SRC_FT# P_011_MSeq';
#
# Resulting in a query like this
# "/opt/ibm/working/ACE_ID_4_0_D2/data/SRC_FTP_011_???????????????????.ctl"
# "/opt/ibm/working/ACE_ID_4_0_D2/data/SRC_FTP_022_???????????????????.ctl"
#
# The quotes are removed from the values later in the script


echo "whenever sqlerror exit 1;" > /tmp/$$controlFileToFind.sql
echo "set pagesize 0 feedback off linesize 500;" >> /tmp/$$controlFileToFind.sql
echo "SELECT CONCAT(CONCAT('"'"'"',""FILE_LOCATION),CONCAT(CONTROL_FILE_PATTERN,'"'"'"'"")) || ',' || CASE WHEN CONTROL_FILE_PATTERN = DATA_FILE_PATTERN then 'N' else 'Y' END " >> /tmp/$$controlFileToFind.sql
echo " FROM FILE_MASTER " >> /tmp/$$controlFileToFind.sql
echo "WHERE SOURCE_SYSTEM =""'$sourceSystemName'"" and LAUNCH_SEQUENCE ='$launchSequence' and IN_OUT_BOUND = '$IN_OUT_BOUND' order by 1;" >> /tmp/$$controlFileToFind.sql
echo "exit;" >> /tmp/$$controlFileToFind.sql


runSQLPlus $oracleUser $oraclePass $oracleDB /tmp/$$controlFileToFind.sql /tmp/$$controlFilesFound.txt

#Store the list of files for use later
controlFilesToFind=`cat /tmp/$$controlFilesFound.txt | cut -d',' -f1`

# Count the number of rows in the file. This is used to assist in exiting the loop below
expectedControlFileCount=`cat /tmp/$$controlFilesFound.txt | wc -l | sed -e 's/ //g' -e 's/://g'`

# There is a blank trailing line. Decrement the counter accordingly
#expectedControlFileCount=$((expectedControlFileCount-1))

#Initialise loop counter
i=1
#Initialise Control file counter
actualControlFileCount=0

touch /tmp/$$controlMasksFound.txt

#Begin wait loop, periodically checking if files have arrived.
while [ $i -le $Nwaits ]
do

runSQLPlus $oracleUser $oraclePass $oracleDB /tmp/$$controlFileToFind.sql /tmp/$$controlFilesFound.txt

# Loop through list of patterns returned from the query on FILE_MASTER

        for line in `cat /tmp/$$controlFilesFound.txt`
        do
                lfp=`echo ${line} | cut -d',' -f1`
                is_control_file=`echo ${line} | cut -d',' -f2`

                # Establish the file pattern. Strip out directory and trailing double quote . This variable is used later in a query on FILE_MASTER
                controlFilePattern=`echo "${lfp}" | awk -F"/" '{print $NF }' | awk -F'"' '{print $1 }'`

                # Remove the double quotes from the row read. Variable contains patha and file name
                lfp=`echo ${lfp} | sed -e 's/"//g'`
                numrows=`ls -1 ${lfp} 2> /dev/null | wc -l`

                if [ ${numrows} -eq 0 ] ; then
                        continue
                fi

                if [ ${numrows} -gt 1 ]; then
                        echo "Multiple files received matching pattern ${lfp}";
                        ls -1 ${lfp};
                        echo "Ensure that the correct set of files are in the inbound directory and restart.";
                        exit 1;
                fi

                #if [ ! -f ${lfp} ]; then
                #       continue
                #fi

                # Listing the file will resolve the pattern to the actual file. For some peculiar reason this required by the next command
                lfp=`ls -1 ${lfp}`

                # Get the extended attributes of the file
                controlFileName=`ls -lE "${lfp}"`

                # Populate the Variables used in the insert statement into FILE_REGISTRATION
                filePerms=`echo ${controlFileName} | awk '{ FS=" " } { print $1 }'`
                fileOwner=`echo ${controlFileName} | awk '{ FS=" " } { print $3 }'`
                fileGroup=`echo ${controlFileName} | awk '{ FS=" " } { print $4 }'`
                fileSize=`echo ${controlFileName} | awk '{ FS=" " } { print $5 }'`
                #dateRecd=`echo ${controlFileName} | awk '{ FS=" " } { print $6 , $7 }'`

                                        filerecyear=`istat ${lfp} | grep 'Last modified' | cut -f4-9 -d' ' | cut -c17-20`
                                        abbr_month=`istat ${lfp} | grep 'Last modified' | cut -f3 -d' '`
                                        filerecdate=`istat ${lfp} | grep 'Last modified' | cut -f4-9 -d' ' | cut -c1-2 | sed "s/ /0/"`
                                        filerectime=`istat ${lfp} | grep 'Last modified' | cut -f4-9 -d' ' | cut -c4-11`
                                        filerecmonth=$(echo $abbr_month | sed \
                                                -e "s/Jan/01/" \
                                                        -e "s/Feb/02/" \
                                                        -e "s/Mar/03/" \
                                                        -e "s/Apr/04/" \
                                                        -e "s/May/05/" \
                                                        -e "s/Jun/06/" \
                                                        -e "s/Jul/07/" \
                                                        -e "s/Aug/08/" \
                                                        -e "s/Sep/09/" \
                                                        -e "s/Oct/10/" \
                                                        -e "s/Nov/11/" \
                                                        -e "s/Dec/12/")
                                        dateRecd=`echo "$filerecyear-$filerecmonth-$filerecdate $filerectime"`


                recdLocn=`dirname ${lfp}`
                fileName=`basename ${controlFileName}`
               #rowCount=`cat ${lfp} | wc -l`
                rowCount=`sed -n '2,$p' ${lfp}|wc -l`
                # Build a query on the FILE_REGISTRATION table to see if we have already processed this file
                echo "whenever sqlerror exit 1;" > /tmp/$$controlFileProcessed.sql
                echo "set pagesize 0;" >> /tmp/$$controlFileProcessed.sql
                echo "set linesize 500;" >> /tmp/$$controlFileProcessed.sql
                echo "SELECT STATUS || '^' || CONCAT(RECEIVED_LOCATION,CONCAT('/',FILE_NAME)) FROM FILE_REGISTRATION " >> /tmp/$$controlFileProcessed.sql
                echo "WHERE FILE_NAME='${fileName}' AND BATCH_ID=${batchId}; " >>  /tmp/$$controlFileProcessed.sql
                echo "exit;" >> /tmp/$$controlFileProcessed.sql

                runSQLPlus $oracleUser $oraclePass $oracleDB /tmp/$$controlFileProcessed.sql /tmp/$$controlFileProcessQueryOutput.txt

                controlFileStatus=`cat /tmp/$$controlFileProcessQueryOutput.txt | cut -d'^' -f1`
                controlFileQuery=`cat /tmp/$$controlFileProcessQueryOutput.txt | cut -d'^' -f2`

                # If file exists in the file_registration table, and does not have status of 'RECEIVED'
                # (ie, file has been previously processed) - abort
                if [[ ${controlFileQuery} == ${lfp} && ${controlFileStatus} != ${status} ]]; then
                        echo "Inbound file has already been processed: ${fileName}";
                        exit 1;
                fi

                if [ `echo "${controlFileQuery}"` != "${lfp}" ]; then

                        # Register the file Arrival

                        if [ ${is_control_file} != 'N' ]; then
                          $SCRIPT_PATH/registerFiles.ksh $oracleUser $oraclePass $oracleDB $fileName $fileSize $dateRecd $batchId $recdLocn $status $sourceSystemName $fileOwner $fileGroup $filePerms $launchSequence 'CONTROL' $rowCount
                          CmdStatus=$?

                          if [ $CmdStatus != 0 ]; then
                                echo "File Registration has failed you may need to remove the control file entries for $fileName from "
                                echo "FILE_REGISTRATION"
                                exit 1
                          fi
                        fi

                        # Retrieve the detail file pattern for the this control file pattern
                        echo "whenever sqlerror exit 1;" > /tmp/$$dataFileToFind.sql
                        echo "set pagesize 0 feedback off linesize 500;" >> /tmp/$$dataFileToFind.sql
                        echo "SELECT CONCAT(FILE_LOCATION,DATA_FILE_PATTERN) FROM FILE_MASTER "  >> /tmp/$$dataFileToFind.sql
                        echo "WHERE SOURCE_SYSTEM ='$sourceSystemName' and LAUNCH_SEQUENCE ='$launchSequence' "  >> /tmp/$$dataFileToFind.sql
                        echo "and CONTROL_FILE_PATTERN='"${controlFilePattern}"' ; " >> /tmp/$$dataFileToFind.sql
                        echo "exit;"  >> /tmp/$$dataFileToFind.sql

                        runSQLPlus $oracleUser $oraclePass $oracleDB /tmp/$$dataFileToFind.sql /tmp/$$dataFileToFindOutput.txt

                        dataFilesToFind=`cat /tmp/$$dataFileToFindOutput.txt`

                        if [ "${dataFilesToFind}" = "no rows selected" ]; then
                                echo "Database didnt return list of Files. Please check the arguments"
                                exit 1
                        fi

                        # Loop though the list of files that satisfy the current file pattern
                        # Perfom check to see files exist. They should, because the control file is here,
                        # but perform check anyway to be sure. At least one should exist

                        filesExist=`ls -1 ${dataFilesToFind}`
                        CmdStatus=$?
                        if [ $CmdStatus != 0 ]; then
                                echo "No DATA files exist for the CONTROL file ${lfp}"
                                exit 1
                        fi

                        for dataFileInstance in `ls -1 ${dataFilesToFind}`
                        do

                        # For any files that are found, register these
                        # Register File
                        # Populate the Variables used in the insert statement into FILE_REGISTRATION
                                dataFileInstanceName=`ls -lE ${dataFileInstance}`
                                filePerms=`echo ${dataFileInstanceName} | awk '{ FS=" " } { print $1 }'`
                                fileOwner=`echo ${dataFileInstanceName} | awk '{ FS=" " } { print $3 }'`
                                fileGroup=`echo ${dataFileInstanceName} | awk '{ FS=" " } { print $4 }'`
                                fileSize=`echo ${dataFileInstanceName} | awk '{ FS=" " } { print $5 }'`
                                #dateRecd=`echo ${dataFileInstanceName} | awk '{ FS=" " } { print $6 , $7 }'`

                                        filerecyear=`istat ${dataFileInstance} | grep 'Last modified' | cut -f4-9 -d' ' | cut -c17-20`
                                        abbr_month=`istat ${dataFileInstance} | grep 'Last modified' | cut -f3 -d' '`
                                        filerecdate=`istat ${dataFileInstance} | grep 'Last modified' | cut -f4-9 -d' ' | cut -c1-2 | sed "s/ /0/"`
                                        filerectime=`istat ${dataFileInstance} | grep 'Last modified' | cut -f4-9 -d' ' | cut -c4-11`
                                        filerecmonth=$(echo $abbr_month | sed \
                                                -e "s/Jan/01/" \
                                                        -e "s/Feb/02/" \
                                                        -e "s/Mar/03/" \
                                                        -e "s/Apr/04/" \
                                                        -e "s/May/05/" \
                                                        -e "s/Jun/06/" \
                                                        -e "s/Jul/07/" \
                                                        -e "s/Aug/08/" \
                                                        -e "s/Sep/09/" \
                                                        -e "s/Oct/10/" \
                                                        -e "s/Nov/11/" \
                                                        -e "s/Dec/12/")
                                        dateRecd=`echo "$filerecyear-$filerecmonth-$filerecdate $filerectime"`


                                recdLocn=`dirname ${dataFileInstance}`
                                fileName=`basename $dataFileInstanceName`
                                # Get the Record Width to calculate Row Count
                                #recWidth=`head -1 ${dataFileInstance} | wc -c | tr -cd '[[:digit:]]'`
                                #if [ ${recWidth} -eq 0 ] ; then
                                        #recWidth=1
                                #fi
                                #rowCount=`expr $fileSize / $recWidth`
                                # Register the file Arrival
                                $SCRIPT_PATH/registerFiles.ksh $oracleUser $oraclePass $oracleDB $fileName $fileSize $dateRecd $batchId $recdLocn $status $sourceSystemName $fileOwner $fileGroup $filePerms $launchSequence 'DATA' $rowCount
                        CmdStatus=$?
                        if [ $CmdStatus != 0 ]; then
                                echo "File Registration has failed you may need to remove the control file entries for $fileName from "
                                echo "FILE_REGISTRATION"
                                exit 1
                        fi

                        echo ${line} >> /tmp/$$controlMasksFound.txt

                        done


                        # Increment the Conrol File counter
                        actualControlFileCount=$((actualControlFileCount+1))
                else
                        #echo "about to continue"
                        echo ${line} >> /tmp/$$controlMasksFound.txt
                        continue
                fi

        done

        sort -u /tmp/$$controlMasksFound.txt | join -v 1 /tmp/$$controlFilesFound.txt - > /tmp/$$tmp_controlFilesFound.txt
        vFilesToGo=`wc -l /tmp/$$tmp_controlFilesFound.txt | awk '{ FS=" " } { print $1 }'`

        if [ ${vFilesToGo} = 0 ]; then
                exit 0;
        else
                mv /tmp/$$tmp_controlFilesFound.txt /tmp/$$controlFilesFound.txt
                echo "Sleeping ${Waittime} seconds"
                sleep ${Waittime}
        fi

        #if [ "${actualControlFileCount}" = "${expectedControlFileCount}" ]; then
        #       exit 0;
        #else
        #       sleep $Waittime
        #fi


        i=$((i+1))
done

if [ $i -gt $Nwaits ]; then

        echo
        echo "*** Files were not received within prescribed time window ***"
        echo
        echo "The following files were not found:"
        cat /tmp/$$controlFilesFound.txt | cut -d',' -f1
        echo
        echo "*** EXITING ***"
        exit 1
fi

The current script has above logic

You say: "Currently my script has a logic where it aborts with exit code 1 if it finds more than one file matching the pattern". Is this what you want your script to do? If not, what do you want this script to do?

Note that if the first two characters of your script are not #! (note the two leading spaces in your script), if you invoke this script by name (as opposed to invoking it by running ksh scriptname ), this script will be run using your system's default shell; not necessarily /bin/ksh .

If you want to know if your script was called with 12 arguments, a much safer test would be:

if [ $# -ne 12 ]; then
        echo "Incorrect number of Arguments specified"
        exit 1
fi

Why do you explicitly complicate your SQL script to add double quotes to the output it produces when the only use for those quotes is to write more code in your script to strip out those quotes?

There are several sequences of similar pipelines of very similar commands repeatedly processing the same input to extract individual fields. It would be MUCH MUCH more efficient to run those commands once, printing the values desired for all of the fields to be set, and pipe it those results through read to set all of the variables at once.

There are no leading spaces in the script. Must have come when I pasted the code into this thread

I want my new script to do the opposite of what the current script is doing

I need it to Not Abort when it finds multiple files.