I have a requirement where I need parse through files in a directory which have a specific pattern and then check whether the file has been processed or not. The exit condition is any file that has been processed will have an entry in database. If it is processed i.e., if an entry is present for the file in database then I need to skip it and move towards the next file in the order of ascending dates
What database are you using? How do you access your database? What keys are used in your database? What field in your database contains the name of the file? Is it a filename or a pathname?
How is the age of a file to be determined? (Last modification timestamp? A timestamp encoded in the name of the file? Something else?)
AIX
ksh
Oracle
Access using sqlplus within script
Database field contains file pattern not file name and in another column it contains absolute path
age of file to be determined by modification timestamp
Currently my script has a logic where it aborts with exit code 1 if it finds more than one file matching the pattern
#!/bin/ksh
#set -x
# Required arguments
# ==================
# 1 - oracle user account
# 2 - oracle user account password
# 3 - oracle database instance name
# 4 - Source System Name.
# 5 - Datastage Master Sequence Name
# 6 - Time to wait between each iteration.
# 7 - Maximum Wait time.
# 8 - Project Name
# 9 - Batch ID
# 10 - 'IN/OUT'
# 11 - STATUS
# 12 - SCRIPT PATH
IFS=''
runSQLPlus() {
sqlplus -S -L ${1}/${2}@${3} @${4} > ${5}
CmdStatus=$?
if [ $CmdStatus != 0 ]; then
echo "SQLPLus command was unsuccessful. Please review ${4} and ${5}"
echo " Remember to issue rm ${4} and rm ${5} to remove"
echo " these files once the error has been diagnosed"
exit 1
fi
}
currDateReg=`date +"%Y-%m-%d %H:%M:%S"`
if [ -n "${12}" ]; then
echo
else
echo "Incorrect number of Arguments specified"
exit 1
fi
# Initialize Variables
oracleUser=${1}
oraclePass=${2}
oracleDB=${3}
sourceSystemName=$4
launchSequence=$5
Waittime=$6
MaxWaittime=$7
projName=$8
IN_OUT_BOUND=${10}
status=${11}
SCRIPT_PATH=${12}
Nwaits=`expr $MaxWaittime / $Waittime `
fileSize=''
dateRecd=''
batchId=${9} #`/opt/ibm/working/OCI/appl/sh/restart/genBatchId.ksh $oracleUser $oraclePass $oracleDB`
recdLocn=''
fileOwner=''
fileGroup=''
filePerms=''
registeredFlag=0
# SQL file for retrieving the file name from File Master table . Note that the control file pattern is difficult to use and is wrapped in
# quotes as follows. If this is not done the variables containing the pattern will resolve to actual file names.
#
# SELECT CONCAT(CONCAT('"',FILE_LOCATION),CONCAT(CONTROL_FILE_PATTERN,'"')) FROM FILE_MASTER WHERE SOURCE_SYSTEM ='FIN' and LAUNCH_SEQUENCE ='SRC_FT# P_011_MSeq';
#
# Resulting in a query like this
# "/opt/ibm/working/ACE_ID_4_0_D2/data/SRC_FTP_011_???????????????????.ctl"
# "/opt/ibm/working/ACE_ID_4_0_D2/data/SRC_FTP_022_???????????????????.ctl"
#
# The quotes are removed from the values later in the script
echo "whenever sqlerror exit 1;" > /tmp/$$controlFileToFind.sql
echo "set pagesize 0 feedback off linesize 500;" >> /tmp/$$controlFileToFind.sql
echo "SELECT CONCAT(CONCAT('"'"'"',""FILE_LOCATION),CONCAT(CONTROL_FILE_PATTERN,'"'"'"'"")) || ',' || CASE WHEN CONTROL_FILE_PATTERN = DATA_FILE_PATTERN then 'N' else 'Y' END " >> /tmp/$$controlFileToFind.sql
echo " FROM FILE_MASTER " >> /tmp/$$controlFileToFind.sql
echo "WHERE SOURCE_SYSTEM =""'$sourceSystemName'"" and LAUNCH_SEQUENCE ='$launchSequence' and IN_OUT_BOUND = '$IN_OUT_BOUND' order by 1;" >> /tmp/$$controlFileToFind.sql
echo "exit;" >> /tmp/$$controlFileToFind.sql
runSQLPlus $oracleUser $oraclePass $oracleDB /tmp/$$controlFileToFind.sql /tmp/$$controlFilesFound.txt
#Store the list of files for use later
controlFilesToFind=`cat /tmp/$$controlFilesFound.txt | cut -d',' -f1`
# Count the number of rows in the file. This is used to assist in exiting the loop below
expectedControlFileCount=`cat /tmp/$$controlFilesFound.txt | wc -l | sed -e 's/ //g' -e 's/://g'`
# There is a blank trailing line. Decrement the counter accordingly
#expectedControlFileCount=$((expectedControlFileCount-1))
#Initialise loop counter
i=1
#Initialise Control file counter
actualControlFileCount=0
touch /tmp/$$controlMasksFound.txt
#Begin wait loop, periodically checking if files have arrived.
while [ $i -le $Nwaits ]
do
runSQLPlus $oracleUser $oraclePass $oracleDB /tmp/$$controlFileToFind.sql /tmp/$$controlFilesFound.txt
# Loop through list of patterns returned from the query on FILE_MASTER
for line in `cat /tmp/$$controlFilesFound.txt`
do
lfp=`echo ${line} | cut -d',' -f1`
is_control_file=`echo ${line} | cut -d',' -f2`
# Establish the file pattern. Strip out directory and trailing double quote . This variable is used later in a query on FILE_MASTER
controlFilePattern=`echo "${lfp}" | awk -F"/" '{print $NF }' | awk -F'"' '{print $1 }'`
# Remove the double quotes from the row read. Variable contains patha and file name
lfp=`echo ${lfp} | sed -e 's/"//g'`
numrows=`ls -1 ${lfp} 2> /dev/null | wc -l`
if [ ${numrows} -eq 0 ] ; then
continue
fi
if [ ${numrows} -gt 1 ]; then
echo "Multiple files received matching pattern ${lfp}";
ls -1 ${lfp};
echo "Ensure that the correct set of files are in the inbound directory and restart.";
exit 1;
fi
#if [ ! -f ${lfp} ]; then
# continue
#fi
# Listing the file will resolve the pattern to the actual file. For some peculiar reason this required by the next command
lfp=`ls -1 ${lfp}`
# Get the extended attributes of the file
controlFileName=`ls -lE "${lfp}"`
# Populate the Variables used in the insert statement into FILE_REGISTRATION
filePerms=`echo ${controlFileName} | awk '{ FS=" " } { print $1 }'`
fileOwner=`echo ${controlFileName} | awk '{ FS=" " } { print $3 }'`
fileGroup=`echo ${controlFileName} | awk '{ FS=" " } { print $4 }'`
fileSize=`echo ${controlFileName} | awk '{ FS=" " } { print $5 }'`
#dateRecd=`echo ${controlFileName} | awk '{ FS=" " } { print $6 , $7 }'`
filerecyear=`istat ${lfp} | grep 'Last modified' | cut -f4-9 -d' ' | cut -c17-20`
abbr_month=`istat ${lfp} | grep 'Last modified' | cut -f3 -d' '`
filerecdate=`istat ${lfp} | grep 'Last modified' | cut -f4-9 -d' ' | cut -c1-2 | sed "s/ /0/"`
filerectime=`istat ${lfp} | grep 'Last modified' | cut -f4-9 -d' ' | cut -c4-11`
filerecmonth=$(echo $abbr_month | sed \
-e "s/Jan/01/" \
-e "s/Feb/02/" \
-e "s/Mar/03/" \
-e "s/Apr/04/" \
-e "s/May/05/" \
-e "s/Jun/06/" \
-e "s/Jul/07/" \
-e "s/Aug/08/" \
-e "s/Sep/09/" \
-e "s/Oct/10/" \
-e "s/Nov/11/" \
-e "s/Dec/12/")
dateRecd=`echo "$filerecyear-$filerecmonth-$filerecdate $filerectime"`
recdLocn=`dirname ${lfp}`
fileName=`basename ${controlFileName}`
#rowCount=`cat ${lfp} | wc -l`
rowCount=`sed -n '2,$p' ${lfp}|wc -l`
# Build a query on the FILE_REGISTRATION table to see if we have already processed this file
echo "whenever sqlerror exit 1;" > /tmp/$$controlFileProcessed.sql
echo "set pagesize 0;" >> /tmp/$$controlFileProcessed.sql
echo "set linesize 500;" >> /tmp/$$controlFileProcessed.sql
echo "SELECT STATUS || '^' || CONCAT(RECEIVED_LOCATION,CONCAT('/',FILE_NAME)) FROM FILE_REGISTRATION " >> /tmp/$$controlFileProcessed.sql
echo "WHERE FILE_NAME='${fileName}' AND BATCH_ID=${batchId}; " >> /tmp/$$controlFileProcessed.sql
echo "exit;" >> /tmp/$$controlFileProcessed.sql
runSQLPlus $oracleUser $oraclePass $oracleDB /tmp/$$controlFileProcessed.sql /tmp/$$controlFileProcessQueryOutput.txt
controlFileStatus=`cat /tmp/$$controlFileProcessQueryOutput.txt | cut -d'^' -f1`
controlFileQuery=`cat /tmp/$$controlFileProcessQueryOutput.txt | cut -d'^' -f2`
# If file exists in the file_registration table, and does not have status of 'RECEIVED'
# (ie, file has been previously processed) - abort
if [[ ${controlFileQuery} == ${lfp} && ${controlFileStatus} != ${status} ]]; then
echo "Inbound file has already been processed: ${fileName}";
exit 1;
fi
if [ `echo "${controlFileQuery}"` != "${lfp}" ]; then
# Register the file Arrival
if [ ${is_control_file} != 'N' ]; then
$SCRIPT_PATH/registerFiles.ksh $oracleUser $oraclePass $oracleDB $fileName $fileSize $dateRecd $batchId $recdLocn $status $sourceSystemName $fileOwner $fileGroup $filePerms $launchSequence 'CONTROL' $rowCount
CmdStatus=$?
if [ $CmdStatus != 0 ]; then
echo "File Registration has failed you may need to remove the control file entries for $fileName from "
echo "FILE_REGISTRATION"
exit 1
fi
fi
# Retrieve the detail file pattern for the this control file pattern
echo "whenever sqlerror exit 1;" > /tmp/$$dataFileToFind.sql
echo "set pagesize 0 feedback off linesize 500;" >> /tmp/$$dataFileToFind.sql
echo "SELECT CONCAT(FILE_LOCATION,DATA_FILE_PATTERN) FROM FILE_MASTER " >> /tmp/$$dataFileToFind.sql
echo "WHERE SOURCE_SYSTEM ='$sourceSystemName' and LAUNCH_SEQUENCE ='$launchSequence' " >> /tmp/$$dataFileToFind.sql
echo "and CONTROL_FILE_PATTERN='"${controlFilePattern}"' ; " >> /tmp/$$dataFileToFind.sql
echo "exit;" >> /tmp/$$dataFileToFind.sql
runSQLPlus $oracleUser $oraclePass $oracleDB /tmp/$$dataFileToFind.sql /tmp/$$dataFileToFindOutput.txt
dataFilesToFind=`cat /tmp/$$dataFileToFindOutput.txt`
if [ "${dataFilesToFind}" = "no rows selected" ]; then
echo "Database didnt return list of Files. Please check the arguments"
exit 1
fi
# Loop though the list of files that satisfy the current file pattern
# Perfom check to see files exist. They should, because the control file is here,
# but perform check anyway to be sure. At least one should exist
filesExist=`ls -1 ${dataFilesToFind}`
CmdStatus=$?
if [ $CmdStatus != 0 ]; then
echo "No DATA files exist for the CONTROL file ${lfp}"
exit 1
fi
for dataFileInstance in `ls -1 ${dataFilesToFind}`
do
# For any files that are found, register these
# Register File
# Populate the Variables used in the insert statement into FILE_REGISTRATION
dataFileInstanceName=`ls -lE ${dataFileInstance}`
filePerms=`echo ${dataFileInstanceName} | awk '{ FS=" " } { print $1 }'`
fileOwner=`echo ${dataFileInstanceName} | awk '{ FS=" " } { print $3 }'`
fileGroup=`echo ${dataFileInstanceName} | awk '{ FS=" " } { print $4 }'`
fileSize=`echo ${dataFileInstanceName} | awk '{ FS=" " } { print $5 }'`
#dateRecd=`echo ${dataFileInstanceName} | awk '{ FS=" " } { print $6 , $7 }'`
filerecyear=`istat ${dataFileInstance} | grep 'Last modified' | cut -f4-9 -d' ' | cut -c17-20`
abbr_month=`istat ${dataFileInstance} | grep 'Last modified' | cut -f3 -d' '`
filerecdate=`istat ${dataFileInstance} | grep 'Last modified' | cut -f4-9 -d' ' | cut -c1-2 | sed "s/ /0/"`
filerectime=`istat ${dataFileInstance} | grep 'Last modified' | cut -f4-9 -d' ' | cut -c4-11`
filerecmonth=$(echo $abbr_month | sed \
-e "s/Jan/01/" \
-e "s/Feb/02/" \
-e "s/Mar/03/" \
-e "s/Apr/04/" \
-e "s/May/05/" \
-e "s/Jun/06/" \
-e "s/Jul/07/" \
-e "s/Aug/08/" \
-e "s/Sep/09/" \
-e "s/Oct/10/" \
-e "s/Nov/11/" \
-e "s/Dec/12/")
dateRecd=`echo "$filerecyear-$filerecmonth-$filerecdate $filerectime"`
recdLocn=`dirname ${dataFileInstance}`
fileName=`basename $dataFileInstanceName`
# Get the Record Width to calculate Row Count
#recWidth=`head -1 ${dataFileInstance} | wc -c | tr -cd '[[:digit:]]'`
#if [ ${recWidth} -eq 0 ] ; then
#recWidth=1
#fi
#rowCount=`expr $fileSize / $recWidth`
# Register the file Arrival
$SCRIPT_PATH/registerFiles.ksh $oracleUser $oraclePass $oracleDB $fileName $fileSize $dateRecd $batchId $recdLocn $status $sourceSystemName $fileOwner $fileGroup $filePerms $launchSequence 'DATA' $rowCount
CmdStatus=$?
if [ $CmdStatus != 0 ]; then
echo "File Registration has failed you may need to remove the control file entries for $fileName from "
echo "FILE_REGISTRATION"
exit 1
fi
echo ${line} >> /tmp/$$controlMasksFound.txt
done
# Increment the Conrol File counter
actualControlFileCount=$((actualControlFileCount+1))
else
#echo "about to continue"
echo ${line} >> /tmp/$$controlMasksFound.txt
continue
fi
done
sort -u /tmp/$$controlMasksFound.txt | join -v 1 /tmp/$$controlFilesFound.txt - > /tmp/$$tmp_controlFilesFound.txt
vFilesToGo=`wc -l /tmp/$$tmp_controlFilesFound.txt | awk '{ FS=" " } { print $1 }'`
if [ ${vFilesToGo} = 0 ]; then
exit 0;
else
mv /tmp/$$tmp_controlFilesFound.txt /tmp/$$controlFilesFound.txt
echo "Sleeping ${Waittime} seconds"
sleep ${Waittime}
fi
#if [ "${actualControlFileCount}" = "${expectedControlFileCount}" ]; then
# exit 0;
#else
# sleep $Waittime
#fi
i=$((i+1))
done
if [ $i -gt $Nwaits ]; then
echo
echo "*** Files were not received within prescribed time window ***"
echo
echo "The following files were not found:"
cat /tmp/$$controlFilesFound.txt | cut -d',' -f1
echo
echo "*** EXITING ***"
exit 1
fi
You say: "Currently my script has a logic where it aborts with exit code 1 if it finds more than one file matching the pattern". Is this what you want your script to do? If not, what do you want this script to do?
Note that if the first two characters of your script are not #! (note the two leading spaces in your script), if you invoke this script by name (as opposed to invoking it by running ksh scriptname ), this script will be run using your system's default shell; not necessarily /bin/ksh .
If you want to know if your script was called with 12 arguments, a much safer test would be:
if [ $# -ne 12 ]; then
echo "Incorrect number of Arguments specified"
exit 1
fi
Why do you explicitly complicate your SQL script to add double quotes to the output it produces when the only use for those quotes is to write more code in your script to strip out those quotes?
There are several sequences of similar pipelines of very similar commands repeatedly processing the same input to extract individual fields. It would be MUCH MUCH more efficient to run those commands once, printing the values desired for all of the fields to be set, and pipe it those results through read to set all of the variables at once.