We are facing some performance issue in UNIX. If someone had faced such kind of issue in past please provide your suggestions on this .
Problem Definition:
/Few of load processes of our Finance Application are facing issue in UNIX when they uses a shell script having below portion of code. The below portion of codes reads an input file and writes them into an .dat file. The performance issue arises when there is huge volume of data in the input file.
For example: For data volume having 200,000 records is taking 38 mins to get append/write into the .dat file which increases the complete load process timings. We need to increase the performance of this proces by reducing the time its taking to append/write the records.
/*****************************************
Portion of Code from Shell Script:
/***********************************************************************************************************************************************
m_arr_ctr=1
cat ${m_recv_dir}/${m_glb_d92_nm}${m_glb_file_seq} |while read d92_line
do
m_brch_cd=`echo "${d92_line}" |cut -c166-168`
# This is the case when we reach the last line '*/', we just skip that line
if [ "${m_brch_cd}" = "" ]
then
continue
fi
if [ "${m_brch_cd}" = "400" ]
then
m_jv_cd=`echo "${d92_line}" |cut -c190-192`
else
m_jv_cd=${m_brch_cd}
fi
if [ ! -s tmp_d92${m_brch_cd}z${m_jv_cd} ]
then
echo "TMP" > tmp_d92${m_brch_cd}z${m_jv_cd}
m_a_d92_list[$m_arr_ctr]=tmp_d92${m_brch_cd}z${m_jv_cd}
m_a_d92_files[$m_arr_ctr]=${m_recv_dir}/gd${m_brch_cd}x${m_jv_cd}${m_glb_rate_cd}.dat
m_arr_ctr=`expr $m_arr_ctr + 1`
m_touched="N"
else
m_touched="Y"
fi
if [ m_touched = "N" ]
then
echo "${d92_line}" > ${m_recv_dir}/gd${m_brch_cd}${m_jv_cd}${m_glb_rate_cd}.dat
else
echo "${d92_line}" >> ${m_recv_dir}/gd${m_brch_cd}${m_jv_cd}${m_glb_rate_cd}.dat
fi
done
for m_file_name in `echo ${m_a_d92_files
[]}`
do
if [[ `grep "/" ${m_file_name} | wc -l` = 0 ]]
then
echo "/" >> ${m_file_name}
fi
done
for m_file_name in `echo ${m_a_d92_list
[]}`
do
rm -f $m_file_name
done
/************************************
Please provide your valuable suggestions. Also is there any way by using SED command for appending the output in fast way?
[indent]
That cat is an unnecessary external command, but since it is only run once, eliminating it wll make very little difference.
Part of the slowness is due to calling multiple external commands (many of which are unnecessary: there's no need for expr as the shell can do its own arithmetic) for every line.
What shell are you using? If it's bash or ksh93, you can replace the call to cut:
m_brch_cd=${d92_line:165:3}
An unnecessary subshell (here and later) can add a significant amount of time. Use:
The ksh version is 88f. Also i implemented the comand which you gave but the one having removing cut (i.e m_brch_cd=${d92_line:165:3} ) did not worked as you said it will work for ksh93 . And rest of the command did not improved the perfoprmance much . (it improved performance by 1-2 mins). Can you please help me with the suggestion of using AWK. I am very new to AWK .
Please describe exactly what the script needs to do.
This script splits the data from Detail files (i.e which are the input files for the shell script in .txt format) . In this case the detal file located at ${m_recv_dir}/${m_glb_d92_nm}${m_glb_file_seq} . which is the starting portion of the code which i posted.
This script reads the data line by line from the text file and prepare output .DAT file.
Once the .DAT file is created it puts '*/' end of file character at the bottom of the output file generated. Once the .DAT output file is generated another shell script loads data from this .DAT files to work tables of the database using SQL Loader.
What files does it use for input? What is the format of those files?
The format of the input file is .txt
What is the format of the output?
The output format is .DAT
Please do let me know what else information you need so you can help me on this..
The name of the input file looks like this : glbd92_1000112008_0402110932
What has to be done to it to prepare it for the .DAT file?
I am just redirecting the output to .DAT extension. no Conversion is there related to data from .txt to .DAt . its simply creating new .DAT file or APPENDING the .DTA file based of the loop conditions.
If you need more details of script then please do let me know.
The script is not written by me , its written by someone else i have to enhance the script to make the performance better. The looping conditions are important because there is one Branch/site '400' which they want to cut the characters from the input record and need to keep it as branch.
Can you please let me know wheteher it is possible to avoid the m_arr_ctr and to generate/append the output in a batch instead line by line. Or how to apply awk over ther.
It will be easier for us to suggest solutions if you could lay down the input file structer( which you already did)..tell us the logic of what you wanted to achieve and then an o/p sample for the said inputs..
As far as I can see this section of code reads all the output data files to find out if they contain a '/' and then appends a '/' if there isn't one present.
Earlier in the script we apparently ignored the last line '*/' in the input stream (not proven that that bit of code works).
Providing that '/' was properly ignored in the input stream (an area of the script which could be improved by using grep -v \^'/' instead of the very first cat) it is impossible for a '/' to appear in any of the output files. We can therefore halve the run time by not re-reading the output data before appending the '/'.
for m_file_name in `echo ${m_a_d92_files[*]}`
do
echo "*/" >> ${m_file_name}
done
cafjohnson.
Agreed. The script has many areas which could be improved. Apparently the script works with the files provided, but takes too long.
I looked at whether the "card dealing" method for splitting the data could be improved without using a high level language, but there is insufficient information about the data type distribution and no rules stated about the processing order of the data. As far as I can see the core script is slow because it appends to multiple output files.
The script is slow because you are using the shell on a very large file; that is exacerbated by a number of inefficient constructs and poorly written code.
If I knew exactly what you are trying to do, I could suggest an awk script.