Hope you are doing fine. Let me describe the problem, I have a script that calls another script K2Test.sh, this script K2Test.sh (created by another team) takes date as argument and generates approx 1365 files in localcurves directory for given date.
Out of these 1365 I am only interested in 133 files, so I have created a list of file names (ZEROCURVEFILES as below) that we need to process.
I loop through these 1365 files (`ls $localcurves` as below) and check if file name is in 133 file list (ZEROCURVEFILES ) and if it is then I process the file by reading it line by line.
It seems it takes too long just to process 133 files, am I using some in-efficient code below? is there a way to process it faster? is it slow because I open and read 133 files line by line?
I need to run this script for 400 days which means I would be looping 400 * 1365 times i.e once per day and and for each day process 133 file.
I would really appreciate any help to help make it faster. Here is the code, I know it is too much code, please let me know if something in script.
#!/bin/sh
#e.g. 20110627 (june 27 2011)
currdate=$1
#e.g. 20100310 (march 10 2010)
enddate=$2
#directory where 1365 files get generated
localcurves="/home/sratta/feds/localCurves/curves"
outputdir="/home/sratta/curves"
#output fileto be generated
OUTFILE="/home/sratta/ZeroCurves/BulkLoad.csv"
touch $OUTFILE
# List of 133 curve file names
ZEROCURVEFILES="saud1-monthlinmid \
saud6-monthlinmid \
.....
suvruvr_usdlinmid \
szarzar_usdlinmid "
#Loop until currdate is not equal to enddate (reverse loop)
while [ $currdate -ne $enddate ]
do
#Call K2test.sh which generates 1365 files for a given date in $localcurves directory
./K2test.sh $currdate
filesfound=0
#Loop through the 1365 files generated by K2test.sh in $localcurves directory
for FILE in `ls $localcurves`
do
filesfound=1
#Check if the filename is one of the 133 files we want? If it is only then process otherwise ignore
zerocurvefile=`echo cat $ZEROCURVEFILES|grep $FILE`
# If file is in the list then process it
if [ "$zerocurvefile" != "" ]
then
echo "Processing $LOWERCASEFILE.$currdate file"
#THIS PROCESSING IS SLOW LINE BY LINE
exec 3<&0
#Open the file
exec 0<"$localcurves/$FILE"
cnt=0
rowstoprocess=0
#Read file line by line
while read line
do
cnt=`expr $cnt + 1`
# First line in file contains number of records to process
if [ "$cnt" -eq "1" ]
then
numheadrecords=`echo $line | awk '{FS=""}{print $1}'`
rowstoprocess=`expr $numheadrecords + 2`
echo "Total Number of Rows in header for $LOWERCASEFILE.$currdate is: $numheadrecords"
fi
if [ "$cnt" -gt "1" ] && [ "$cnt" -lt "$rowstoprocess" ]
then
julianmdate=`echo $line | awk '{FS=" "}{print $1}'`
rate=`echo $line | awk '{FS=" "}{print $2}'`
mdate=`echo $line | awk '{FS=" "}{print $4}'`
# extract certain columns and put the data into out file
echo "$LOWERCASEFILE,$currdate,$julianmdate,$rate,$mdate" >> $OUTFILE
fi
# If we have processed number of records as in first line then break the loop
if [ "$cnt" -eq "$rowstoprocess" ]
then
break
fi
done
exec 0<&3
fi
done
#Subtract 1 day from currdate (reverse loop)
currdate=`./shift_date $currdate -1`
done
What Operating System and version are you running?
What Shell is /bin/sh on your computer?
How many lines are processed from the 133 files? Is it definitely not the whole of each file?
Does the script work?
What are these lines for? Is there a local reason for these complex redirects?
There is great scope for efficiency in this script but let's get a feel for the environment and the size of the data files first.
Thanks for look at my post I really appreciate it, I am new to Unix scripting so def. need guidance. Please see my answers
What Operating System and version are you running? It is sun solaris
What Shell is /bin/sh on your computer? How do I tell? I just know i am using sh
How many lines are processed from the 133 files? Is it definitely not the whole of each file? Each file has a number of records on very first line, I read that and process those many rows it can be anywhere from 10 to 200
Does the script work? Yes the script works but each file is taking approx 4 seconds to process and 133 files are taking 523 seconds which is almost 8 minutes for 133 files for 1 day and I have to process it for 400 days which wud take 53 hours
What are these lines for? Is there a local reason for these complex redirects? I copied it from a colleague so if you think there is no reason for these redirections I would appreciate your guidance
michaelrozar17, I did put double square braces and I get a syntax error, what is this for? you want to know which shell it is ?
Thanks Jean-Pierre, I will try it out and let you know.
---------- Post updated at 09:44 AM ---------- Previous update was at 09:38 AM ----------
Jean-Pierre, I am encountering a problem.
The 1365 files generated in $localcurves directory are in mixed-case name i.e. e.g. sCADTierTwolinMid but I need in lower case
If you see the list $ZEROCURVEFILES is all lower case so when we do `ls $ZEROCURVEFILES` it will not find any. Is there a way to do ls case in-sensitive?
(Late post - lost connection, may be out of context)
The version is in the output from the "uname -a" command. It should then be possible to look up whether your Solaris is an old one which has the old Bourne Shell for /bin/sh or a new one with the more modern Posix Shell.
1) The big inefficency is using a Shell "read" to read records line-by-line from a data file, then using multiple "awk" runs to separate the fields.
I see now why you reassigned the channels because you are already using the Shell input channel to read a list of files.
I agree with the ideas behind "agiles" modifications.
2) As you have a list of required files, use that list.
I'd add a test to the script to check whether the file exists.
I see that "agiles" modification is ingeneous because it allows for this by sending errors to /dev/null:
3) Invoke awk only once and use it to read the data from the files.
A lot of the inefficiency comes from the number of times the original script starts "awk" to process the same $line .
4) Hold the list of 133 files in a real file not an environment variable and use "while" rather than "for". Some Bourne shells will not let you have an environment variable that big.
5) Consider making a version of K2test.sh which only generates the relevant 133 files in /home/sratta/feds/localCurves/curves .
6) Noticed that the variable $LOWERCASEFILE is not set anywhere.
7) If you have a journalling filesystem it is inefficient to repeatedly create a batch of files then overwrite them with Shell. Depends whether K2test.sh removes old files before generating new files.
Jean Pierre, I see that when you are using awk you are using printf, I want the data elements to be sent to a file and not printed, how do I make data elements to go to $OUTFILE where OUTFILE is a variable holding name of file.
Check "agiles" next post, but I think this is enough for the redirect:
' LowFile=$LOWERCASEFILE Date=$currdate $FILE >> ${OUTFILE}
However there are other problems:
e.g. There is no value in $LOWERCASEFILE .
It would be so much easier if $ZEROCURVEFILES was the name of a file containing a list of the required files with their correct names. This could be created from a "ls -1" report and deleting the ones you don't want. It could equally be created using a "here document" within the script.
Translating the mixed upper-and-lower filename to lower case is a trivial task for the unix "tr" command. Working from a lower case list is proving to be not trivial.
Any chance you can let us know your version of Solaris?
The awk command you sent will only run for number of record in header correct? For the lines following the first line, it uses space as delimiter within the line to extract fields? as I dont see mention of separator being space.
I will modify the $ZEROCURVEFILES so that it has exact names. After doing that, how do I loop through only those files basically i have to some how use ls command giving it this list then inside loop i can change file name to lower case using tr like you mentioned.
I will let you know the version of Solaris when I reach work. I appreciate ur help.