script to add numbers is slow

Hi,

I am running a BASH shell with the following script. The script works and gives me correct output but is very slow with large files. The more rows and columns (width and height) the slower as you can probably see.

How can I do what I want more efficiently? Any ideas welcome. It has been ages since I have gone down the scripting road and - wow - it is slow to come back.

Thanks in advance.

i=$(($WIDTH * $HEIGHT - $WIDTH - 1))
k=$(($WIDTH - 1))
for ((j=0;j<$i;j++))
do
if [[ $j -eq $k ]]
then echo $((k=$(($k + $WIDTH))))
else
second=$(( j + 1 ))
third=$(( j + WIDTH + 1))
fourth=$(( j + WIDTH))
echo "4 $j $second $third $fourth" >> file.out
fi
done

can you provide the sample i/p and o/p
and is it ok if its a ksh script???

if we make the WIDTH = 5 and the HEIGHT = 5
then the output is
4 0 1 6 5
4 1 2 7 6
4 2 3 8 7
4 3 4 9 8
4 5 6 11 10
4 6 7 12 11
4 7 8 13 12
4 8 9 14 13
4 10 11 16 15
4 11 12 17 16
4 12 13 18 17
4 13 14 19 18
4 15 16 21 20
4 16 17 22 21
4 17 18 23 22
4 18 19 24 23

which is correct but if the WIDTH = 5000 and the HEIGHT = 5000 then it can take hours to create the output file, and yes, a large output file

Have you considered just writing that up as a C program? Multimegabyte output, character by character, seems to be better suited to a more powerful language.

Of course finding ways to do it more efficiently in a shell script would be interesting, and may be your point, but if all you want is a way to generate those files quickly...

you can try out the following one liner
provide WIDTH and HEIGHT to it. since you have such a complicated calculation its tough to make it run fast anyways try it out:)

awk -v var1="50" -v var2="50" 'BEGIN{
i=(var1*var2-var1-1)
k=(var2-1)
       {
          for(j=0;j<i;j++)
          {
                if(j==k)
                {
                     print (k+var1)
                }else
                    {
                     print "4 " j" "(j+1)" "(j+var1+1)" "(j+var1)
                     }
           }
       } 
}'

That is much faster!!

Thank you vidyadhar85

I was trying to figure out an awk solution and then got distracted trying to see if I could add columns of numbers quicker.

Your solution will work for now - again Thank you very much!!

macsurveyr

That is an improvement. I timed it at 2m13.027s doing 5000x5000. But there's a tenfold improvement with a direct translation to C (11.885s):

#include <stdio.h>

main(int argc, char *argv[])
{
    int height = atoi(argv[1]);
    int width = atoi(argv[2]);
    int i = width * height - width - 1;
    int k = width - 1;
    int j;
    for (j = 0; j < i; j++) {
        if (j == k) {
            //printf("%i\n", k + width);
        } else {
            int tmp = j + width;
            printf("4 %i %i %i %i\n", j, j + 1, tmp + 1, tmp);
        }
    }
    return 0;
}

An optimized version might be even better.

CRGreathouse,

Thank you for C

I had been using shell scripts because I could show results quickly and then - gee, guess what - file sizes got bigger. May have to implement with C. Tenfold, at least, improvement is significant.

Big problem - that is very hard to describe - I have no compiler at work - not approved - really, really long story, but simple shell scripts can be implemented so, I have gone that route.

Thanks again tho, speed improvements can make a case for more tools, but bureaucracy can be stifling.

macsurveyr

No problem. If the biggest you're going to do is 500x5000, then vidyadhar85's script should be fine. If you needed to do 50,000x50,000 then it would take ~4 hours, at which point it may be worthwhile to ask for an exception or find a workaround. (If you have chmod, you could compile elsewhere and transfer in some form like base64...)

Speedups (@ 5000x5000):
Original: estimated 30 minutes
vidyadhar85: 2.2 minutes
my executable: 12 seconds
optimized* executable: 2-3 seconds?
multithreaded** optimized executable: < 1 second?

  • The primary optimization here would be avoiding temporaries.
    ** 4-way on a quad core machine producing four files that are joined at the end.

Whoa,

Time savings are impressive.

I do have a dual processor quad core machine available - 8 virtual - two software packages I use actually take advantage so I know it actually works - and your numbers show that it would be almost instantaneous compared to 30 minutes.

hmmm thank you!!!

The multicore savings are only for programs (unlike mine) designed to take advantage of it. But I think my numbers are reasonable here, depending on how much a really smart programmer could beat gcc's optimizer. :smiley: