Bash - array loop performance

math4 · October 1, 2014, 3:24pm

Hi,

another little question...

"sn" is an array whose elements can vary from about 55,000 to about 150,000 elements. Each element consists of an integer between 0-255, eg: ${sn[1]} contain the value: 103 . For a decrypt-procedure I need scroll all the elements 4 or 5 times. Here is an example of one of the loop where n is length of sn array:

 for(( i=0; i<n-2; i++ ))
 do sn[$i]=$((${sn[$i]} ^ ${sn[$i+2]} ^ (${sn[$i+1]} * k4) % 256))
 done

 for(( i=(n-1); i>1; i--))
 do sn[$i]=$(( ${sn[$i]} ^ ${sn[$i-2]} ^ (${sn[$i-1]} * k3) % 256))
 done

It all works ... but it takes 3 minutes to loop even when the equivalent vb6 takes me globally no more than 10 seconds.

For i = 1 To n - 2: sn(i) = sn(i) Xor sn(i + 2) Xor (k4 * sn(i + 1)) Mod 256: Next

I'm doing something wrong? The script is successful without any errors.
I had tried some things... for example work in "do" section with small array in this way:

a=${sn[@]:i:3};
IFS=' ' read -a an <<< "$a"
x=$an{[0]};
y=$an{[1]};
z=$an{[2]};
sn=$(( x ^ z ^ (y*4) % 256 ))

but seems doesn't change much. Any idea how can I increase my loop performance?

thank you
math

blackrageous · October 1, 2014, 3:42pm

maybe it's not the language construct so much as it's the hardware. Do you mean Visual Basic. What are the physical machines?

RudiC · October 1, 2014, 3:45pm

I don't think bash is too good at maths, even though you are dealing with integer math only...

math4 · October 1, 2014, 4:08pm

Hi, thank you for reply.
It's the same machine (P4) that had XP sp3 (32bit), and now run with only ubunto server (without X... and I don't want install WINE).

      product: Intel\(R\) Pentium\(R\) 4 CPU 2.66GHz
      vendor: Intel Corp.
      size: 2700MHz
      width: 32 bits
      ram: 1.5gb

The vb6 prog was a simple .bas service and was very fast.
The bash conversion that I made is very fast everywhere but in loops menthioned is damn slow. Same machine.

---------- Post updated at 03:08 PM ---------- Previous update was at 02:50 PM ----------

Mmm I'm newbie in Linux, do you suggest rewrite the entire script in some else?

What could I use that is already installed in ubuntu-server? ..and preferably that allows me to import a little 'bash code already written?

NOTE: I had try now to remove all math expression/logic code inside each (5) loops present in the script. Execution time it's now about 5 secs!

Akshay_Hegde · October 1, 2014, 4:35pm

Use awk or perl its good for maths, please show a representative sample of input, desired output with brief description .

Chubler_XL · October 1, 2014, 4:44pm

Bash isn't very efficient looking up large arrays I found using bash 4 assoicative arrays gave me a significant boost in speed if you have bash 4 consider using them:

declare -A sn
n=80000
k4=2
k3=6
for((i=0;i<n;i++))
do  sn=$((RANDOM%256))
done

for(( i=0; i<n-2; i++ ))
do sn=$(( (sn ^ sn[i+1] ^ sn[i+2] * k4) % 256 ))
done

for(( i=(n-1); i>1; i--))
do sn=$(( (sn ^ sn[i-2] ^ sn[i-1] * k3) % 256))
done

math4 · October 1, 2014, 4:49pm

Brief:
I read a binary file encrypted, parse char by char loaded using a key, and then, with the result, I create a new file readable.

blackrageous · October 1, 2014, 4:54pm

Interesting...that is a major performance difference. I was actually hoping it was cygwin...cause that is dog slow and obvious. I would hope that changes like that to any linux would perform better. I wonder if it is the looping or the frequent math operations done in the assignment. What did you use to profile performance if anything?

Chubler_XL · October 1, 2014, 4:55pm

Firstly there are already a lot of utilities the encrypt and decrypt files in unix I'd sugget using one of these if you can.

If you must roll you own for all means develop a prototype in shell script, but once it's working I'd suggest converting to C for the final performance version.

math4 · October 1, 2014, 4:55pm

chubler_xl:

Bash isn't very efficient looking up large arrays I found using bash 4 assoicative arrays gave me a significant boost in speed if you have bash 4 consider using them:
declare -A sn

Yes, I read somewhere tha define variable as array speedup the process but when I create the array I use:

IFS=' ' read   -a  sn <<< "$s"

...so I think was the same. I'm in error?

Chubler_XL · October 1, 2014, 4:56pm

I used time, and yes associative arrays look to be around 11x faster than normal arrays 5secs vrs 55secs.

math4 · October 1, 2014, 5:04pm

Nothing special.. I had used "LSHW" for the specs and "ps aux" for see cpu usage, and "free" to see ram. Probably I not understand your request.

Chubler_XL · October 1, 2014, 5:37pm

If you have a string $s and you want to load the ascii value into an array 1 char at a time you could do this:

declare -A sn
s="This is a test string"
for((i=0;i<${#s};i++))
do
  printf -v sn[$i] '%d' "'${s:i:1}"
done

echo sn[0]=${sn[0]}
echo sn[1]=${sn[1]}
echo sn[2]=${sn[2]}

sn[0]=84
sn[1]=104
sn[2]=105

---------- Post updated at 07:37 AM ---------- Previous update was at 07:34 AM ----------

BTW the code I posted before was inaccurate as expressions with [ and ] of an array are not evaluated you need something like this:

declare -A sn
n=80000
k4=2
k3=6

for((i=0;i<n;i++))
do  sn[$i]=$((RANDOM%256))
done

for(( i=0; i<n-2; i++ ))
do sn[$i]=$(( (sn[$i] ^ sn[$((i+1))] ^ sn[$((i+2))] * k4) % 256 ))
done

for(( i=(n-1); i>1; i--))
do sn[$i]=$(( (sn[$i] ^ sn[$((i-2))] ^ sn[$((i-1))] * k3) % 256))
done

Which is still faster than without using associative arrays.

math4 · October 1, 2014, 7:55pm

Elements are group of integer, not single char.
I had study "associative arrays" and make a try: It's better, but not so much.
Tomorrow I'll try a vb6 to Perl or Awk conversion. Hope that make difference.

Thank you in any case at all!

Scrutinizer · October 2, 2014, 2:35am

chubler_xl:

Bash isn't very efficient looking up large arrays I found using bash 4 assoicative arrays gave me a significant boost in speed if you have bash 4 consider using them:
declare -A sn
n=80000
k4=2
k3=6
for((i=0;i<n;i++))
do  sn=$((RANDOM%256))
done

for(( i=0; i<n-2; i++ ))
do sn=$(( (sn ^ sn[i+1] ^ sn[i+2] * k4) % 256 ))
done

for(( i=(n-1); i>1; i--))
do sn=$(( (sn ^ sn[i-2] ^ sn[i-1] * k3) % 256))
done

Interesting.... I did some further testing with bash 3, bash 4 and ksh93 (for the associative array I used the synonym typeset -A instead of declare -A which works both in bash4 and ksh93:

indexed arrays                  bash 3          bash 4          ksh93
                        real    2m4.457s        1m6.420s        0m1.855s
                        user    1m53.857s       0m57.251s       0m1.764s
                        sys     0m0.501s        0m0.399s        0m0.007s

Associative arrays              bash 3          bash 4          ksh93   
                        real    -               0m4.553s        0m1.871s
                        user    -               0m3.867s        0m1.567s
                        sys     -               0m0.171s        0m0.003s

There seems to be an issue with regular arrays in bash that is absent in ksh93 which processes both types of arrays at the same speed...

---
Testing on OSX 10.9.5, bash 3.2.51(1), bash 4.2.0(1), ksh 93u 2011-02-08

math4 · October 2, 2014, 6:14am

I have rewrite the function in perl: wow, very fast!
I like perl seem easy to understand.
If someone is interested...

sub decrypt {
    my ($f, $key) = @_;
    my $s;
    {
      local *FH;
      open FH, "$f" or die $!;
      -f FH and sysread FH, $s, -s FH;
    }
    my $n=length($s);
    my $k1=11+($key % 233); my $k2= 7+($key % 239);
    my $k3= 5+($key % 241); my $k4= 3+($key % 251);
    my @sn = unpack("C*", $s);   
  # debug: print "@sn\n";
    my $i=0;
    for ($i=0; $i<($n-2); $i++) { $sn[$i]=(($sn[$i] ^ $sn[$i+2] ^ ($k4 * $sn[$i+1])) % 256); }
    for ($i=($n-1); $i>1; $i--) { $sn[$i]=(($sn[$i] ^ $sn[$i-2] ^ ($k3 * $sn[$i-1])) % 256); }
    for ($i=0; $i<($n-1); $i++) { $sn[$i]=(($sn[$i] ^ $sn[$i+1] ^ ($k2 * $sn[$i+1])) % 256); }
    for ($i=($n-1); $i>0; $i--) { $sn[$i]=(($sn[$i] ^ $sn[$i-1] ^ ($k1 * $sn[$i-1])) % 256); }
    $s=""; for ($i=0; $i<$n; $i++) { $s.=chr( $sn[$i] ); }  
}