I am checking whether each line is having "n" number of commas or nor. In case not then I need to exit the process.
I tried
cat "$TEMP_FILE" | while read LINE
do
processing_line=`expr $processing_line + 1`
no_of_delimiters=`echo "$LINE" | awk -F ',' '{ print NF }'`
if [ $no_of_delimiters -ne $no_of_expected_fields ]
echo "Error at line $processing_line"
exit
fi
done
It's working fine. However the number of records in the file is around .5 million. So it's taking too much time to process it. Anyway I can improve the performance?
cat-ing small files are the biggest problem, really. For a huge file, the overhead of running cat doesn't matter all that much. But running cat 10,000 times to process 10,000 tiny files will slow it down a lot, the same way it takes longer to say a sentence if you must make a separate phone call for each word.
Note that the variable names $no_of_expected_fields and $no_of_delimiters are not representative of what awk actually does. When you aren't using the default awk field separator (<space>), every occurrence of the field separator separates two fields; it doesn't terminate a field. So for every non-empty line read by awk when the value of FS is a comma (such as by having -F, on the command line), the value of NF (Number of Fields) is the number of delimiters plus 1; not the number of delimiters.
If you want to print an error for any file that does not have $n commas on every line in the file, you need something like:
awk -F, -v n="$n" 'NF!=(n-1){print "Error at line "NF;exit 1}' $TEMP_FILE
if [ $? -ne 0 ]
then exit
fi
# Continue processing $TEMP_FILE...
in your script.
As always, if you are using a Solaris/SunOS system, use /usr/xpg4/bin/awk , /usr/xpg6/bin/awk , or nawk instead of awk .