Strange array handling in ksh93

bakunin · October 20, 2005, 6:28am

I wrote a script in ksh93 (the OS is AIX 5.2, ML7), which failed to run. After some testing i found out why, but the answer is a bit dissatisfying. Look for yourself:

#!/bin/ksh93

# --------- Step Names
typeset achStepName[1]="foo"
typeset achStepName[2]="bar"
typeset achStepName[3]="fubar"
typeset achStepName[4]="test"

(( iCnt = 1 ))
while [ $iCnt -le ${#achStepName[@]} ] ; do
     print - $iCnt ${achStepName[$iCnt]}
     (( iCnt += 1 ))
done

exit 0

One would expect this to give a little table with the entries 1, 2, 3 and 4, just like the array is defined. Instead the output looks like:

1
2 bar
4 fubar
4 test

The reason is, that "foo" will somehow become the array element with the index "0"! After modifying the code to start the loop with iCnt=0 the output table looked like:

0 foo
1
2 bar
4 fubar
4 test

Has anybody a good explanation of this and a way to avoid that? Have I done something wrong?

To be honest, i can't believe a bug that big making it beyond alpha-testing, lest production.

bakunin

Perderabo · October 22, 2005, 3:31pm

Wow! The problem all lies here....

typeset achStepName[1]="foo"
typeset achStepName[2]="bar"
typeset achStepName[3]="fubar"
typeset achStepName[4]="test"

This is actually the screwiest ksh code that I have ever seen. I have to say that I didn't know what behavior to expect. I have just reread "The New KornShell" by Bolsky and Korn. I don't believe that Dave Korn ever anticipated code like that. When I run your code in my ksh , I get the results that you want, not the results that you get. That may or may not be good thing. If I change your "typeset" to be "typeset -R9", I suddenly do get the results you are getting.

A variable in ksh starts life out as a scalar but it later may be promoted to an array. If a variable is promoted to an indexed array and it had a value, that value is retained as element zero. So
var="first"
var[1]="second"
echo ${var[0]}
will result in "first" being echoed.

Now a typeset statement can do 2 things to a variable: assign it a value; and assign it a type. The documentation implies that these operations will be done in that order. This is to support the read-only attribute with a statement like:
typeset -r var="locked value"
Part of what a typeset statement can do is to promote a scalar to an indexed array. The syntax for that is:
typeset variable[5]
This statement variable into an array and also makes the claim that it has 5 elements. Since an array starts with 0, the last possible element is 4. Apparently, ksh does not enforce that limit, so it is for documentation only.

Your first statement:
typeset achStepName[1]="foo"
almost actually makes sense. First you assigned:
achStepName="foo"
then you declared that achStepName is an array with a single element. The only valid index for an array a single element is 0.

Now your next statement:
typeset achStepName[2]="bar"
Well, I really can't tell you what this should do. The statement seems to be self-inconsistent. We seem to be claiming that achStepName now has two elements. If so, they would be 0 and 1. We are also assigning a value... but to what? The third element of our two element array? That seems to be how the interpreters are behaving.

Here is something interesting, the statement:
typeset -Z9 zeros[7]=123
affects zeros[0] the first time it is executed and zeros[7] the second time it is executed. I feel that it should do the same thing both times. Since that is not the case, I must agree that ksh has a bug. I have been programming in ksh for quite a while without encountering this before. So I still do not agree that this bug should have been caught in alpha-testing.

bakunin · October 24, 2005, 10:35am

Hmm....

Actually i didn't know all that about variable handling in the ksh, so many thanks for explaining this.

On the other hand, in ksh(88) the code works as expected: the four statements generate a table with four elements and indices 1,2,3,4. The expression ${#table[@]} evaluates to 4 and the loop works fine.

I tried this now on another machine (AIX 5.1, ML06) and there it worked to what i would have expected too.

As I have understood my ksh manual "arr[n]" not only sizes an array but also denotes a specific array element. Otherwise the expression "print - "${arr[3]}" wouldn't make sense, right?

Further, the ksh93 man page states, that ksh93 now has "associative arrays" and i don't see the difference between "arr[tommy]", "arr[willy]" and "arr[1]". Since the first two values will not (for obvious reasons) be zero-based, why has the latter one to be?

bakunin

Perderabo · October 24, 2005, 12:49pm

Retry all of those version of ksh using something like "typeset -Z9 zeros[7]=123". You will probably find that the 2 statements:
typeset -Z9 zeros[7]=123
typeset zeros[7]=123
work differently. And remember, "zeros" must be undefined to see the difference. On your ksh93 the two statements will both set zeros[0]=123 on the first execution. And they both will set zeros[7]=123 on the second execution.

What arr[n] does depends on where it is used. There is no language on the ksh93 man page stating that an array reference is allowed at all in a typeset statement. Only is the ksh book is there some fleeting mention of using typeset vname[n] to declare an indexed array.

Why do indexed arrays need to start with zero? Because that is way the language works. Dave Korn could have decreed that they start with 1 or even 719. He picked 0, probably because C does. Without a standard, stuff like:
array=(aaa bbb ccc)
set -A array aaa bbb ccc
would not make sense. ksh93 has blurred the distinction bewteen associative arrays and indexed arrays as much as is possible. I am disaapointed with that particular design choice and it is adding to the confusion here. You seem to have lost the concept entirely. I hope this isn't because your ksh93 is broken. Try this:
x[2]=7
x[1+1]=9
echo ${x[2]}
typeset -A x
x[2]=7
x[1+1]=9
echo ${x[2]}
When I run this, I get 9 the first time and 7 the second.