I wrote this shell script to validate filed numbers for input file. But it take forever to complete validation on a file. The average speed is like 9mins/MB.
Can anyone tell me how to improve the performance of a shell script?
I don't have the script right here with me, but I can brief you how my script looks like.
#starts with couple of constants for the file
function1 ...
function2 ...
function3
{
function4
}
function4...
while time < 00:00:00
do
function1
if [ $? -eq 0 ]
then
for loop
do
function2...
function4...
./call_another_script
done
fi
done
There could be something in those functions that is taking too much of a time. For example, some cut or some grep or an invocation of some external tools. Also the script, call_another_script could be the culprit.
Unless you can show what those functions are, it is hard to pinpoint the exact cause.
I would start by setting the debug flag in the shell. It might be obvious just from that what operation is taking the time, without knowing exactly what you are doing in the functions it is not really possible for anyone to answer.
the part that take the most of the time is the following code.
function line_count
{
COUNT=`echo $1 | awk -F\| '{print NF}'`
if [ "$COUNT" != "$2" ]
then
error_log "File $FN: Validation failed at line $LINENUM. Expected $2, getting $COUNT"
return 5
fi
}
function validate_line
{
if [ "$1" = "$FIRST_LEVEL_HEAD" ]
then
line_count "$2" $FIRST_LEVEL_COUNT
return $?
elif [ "$1" = "$SECOND_LEVEL_HEAD" ]
then
line_count "$2" $SECOND_LEVEL_COUNT
return $?
else
error_log "File $FN: Line $LINENUM head is not regconised"
return 5
fi
}
function validate_file
{
trace_log "Start to validate $FN..."
LINENUM=0
ERROR=0
while read LINE
do
LINENUM=`expr $LINENUM + 1`
LINE_HEAD=`echo $LINE | awk -F\| '{print $1}'`
validate_line $LINE_HEAD "$LINE"
if [ ! $? -eq 0 ]
then
ERROR=1
fi
done < $1
if [ ! $ERROR -eq 0 ]
then
return 7
fi
}
validate_file $FILE
function validate_file
{
ERROR=0
OIFS="$IFS"
IFS="|"
while read LINE
do
set -- $LINE
LINE_HEAD="$1"
shift
case $LINE_HEAD in
${FIRST_LEVEL_COUNT}|${SECOND_LEVEL_COUNT})
if [ $# -ne $LINE_HEAD ]
then
ERROR=1
break 2
fi
;;
*)
ERROR=1
break 2
;;
esac
done < $1
IFS="$IFS"
if [ $ERROR -ne 0 ]
then
return 7
fi
}
validate_file $FILE
Which could be read as:
IF the first field is equal to "00" and the number of fields is not equal to 5
OR the first field is equal to "01" and the number of fields is not equal to 4
THEN stop scanning the file and exit with error level 7.