Any trick to speed up script?

Hi Guys,

I have a script that I am using to convert some text files to xls files. I create multiple temp. files in the process of conversion. Other than reducing the temp. files, are there any general tricks to help speed up the script?

I am running it in the bash shell.

Thanks.

Speeding up a script is not usually very easy to do without major rewrites. If you post some of your code here, we can offer better advice.

In general:

  • Call as few things as possible. If you have a loop, watch the calls you make in it. A Useless Use Of Cat that happens 10,000 times isn't a trivial waste anymore. Often people call sed, awk, etc. for single strings, which can be a waste -- you might be able to refactor that, running sed once on a big chunk of data then reading the result into the shell piecewise instead of running sed 10,000 times. Or better yet, instead of making an external call for that:
  • Rely more on shell builtin functions. You can often use the shel builtin read and the IFS variable that controls it to replace things that call cut thousands of times, use shell expressions instead of a regex in sed, and so forth.
  • Limit the number of pipes. Each pipe represents another process that's either competing with you for CPU time, making your shell wait while it reads input, or making your shell wait before you can read its output. I wrote a homemade linewrapper in BASH that used at least six subshells feeding data into each other, that ended up processing text files at about 10 kilobytes per second.
  • Are there existing tools that can do what you want? A while ago there was a thread where someone needed to translate a giant single-line flat file. I wrote a solution in BASH that was horribly slow, then realized the file was so consistent I could just use dd with a few unusual options to convert it in no time flat.

This may not help your situation but one additional thing I've found useful is backgrounding. If you have one section that you know will take much longer than other things, background some other things before running the long section, or vice versa. I recently had a 20+ second script that I was able to cut down to 7 seconds by backgrounding parts of it.

another thing (not verified yet) is to use ramdisk (/dev/shm/) for your temp files.

That won't be any significant gain on a system with disk caching.

Since Excel will open .csv files as spreadsheets, assuming you have no data issues (like comma's in your data), maybe just convert the text to csv file?
I know that at one place, I routinely created csv files that were emailed to people, and they would easily open the csv files into Excel.

Anyway, as someone else said, without seeing your script it is hard to advise.