Possible ksh93 Bug Expanding Variables?

bakunin · July 3, 2017, 5:25am

My OS is Linux (kernel 4.08.something) and AIX (7100-04-01-1543), the used ksh versions are:

ksh88: Version M-11/16/88f (AIX)
ksh93: Version M 93t+ 2009-05-01 (AIX), Version M 93u (Linux)

When writing a parser for stanza files in ksh i encountered a rather strange behavior. Here is a stripped down version of the parsing loop (i left out the actual parsing for clarity if anyone is interested i can post it), i marked the two critical lines in bold:

#! /bin/ksh

typeset chLine=""
typeset chChar=""

while read chLine ; do
     print - "-- Begin Line: $chLine"
     while [ -n "$chLine" ] ; do
          chChar="${chLine%${chLine#?}}"
          chLine="${chLine#?}"

          print - "\n  Line: \"$chLine\""
          print - "  Char: \"$chChar\""

     done
done < /iput/file

exit 0

Here is the output of this with the sample file (1 line) containing item=value , which works as expected:

# ./parsetest.sh  
-- Begin Line: item=value

  Line: "tem=value"
  Char: "i"

  Line: "em=value"
  Char: "t"

  Line: "m=value"
  Char: "e"

  Line: "=value"
  Char: "m"

  Line: "value"
  Char: "="

  Line: "alue"
  Char: "v"

  Line: "lue"
  Char: "a"

  Line: "ue"
  Char: "l"

  Line: "e"
  Char: "u"

  Line: ""
  Char: "e"

Now, because i wanted to have comments in my stanza files (which the parser should filter out), but needed to make escaped comment chars possible i tried with the line item=val\\#ue . Here is the output running above script with with ksh88 , which is as expected:

-- Begin Line: item=val\#ue

  Line: "tem=val\#ue"
  Char: "i"

  Line: "em=val\#ue"
  Char: "t"

  Line: "m=val\#ue"
  Char: "e"

  Line: "=val\#ue"
  Char: "m"

  Line: "val\#ue"
  Char: "="

  Line: "al\#ue"
  Char: "v"

  Line: "l\#ue"
  Char: "a"

  Line: "\#ue"
  Char: "l"

  Line: "#ue"
  Char: "\"

  Line: "ue"
  Char: "#"

  Line: "e"
  Char: "u"

  Line: ""
  Char: "e"

But - and this is where it gets weird - if the script is run with ksh93 , the characterwise chopping off from the main string stops to work correctly:

# ./parsetest.sh     
-- Begin Line: item=val\#ue

  Line: "tem=val\#ue"
  Char: "item=val\#ue"

  Line: "em=val\#ue"
  Char: "tem=val\#ue"

  Line: "m=val\#ue"
  Char: "em=val\#ue"

  Line: "=val\#ue"
  Char: "m=val\#ue"

  Line: "val\#ue"
  Char: "=val\#ue"

  Line: "al\#ue"
  Char: "val\#ue"

  Line: "l\#ue"
  Char: "al\#ue"

  Line: "\#ue"
  Char: "l\"

  Line: "#ue"
  Char: "\"

  Line: "ue"
  Char: "#"

  Line: "e"
  Char: "u"

  Line: ""
  Char: "e"

Notice, that as long as the escape char is present in the string the variable expansion in the two marked lines seems not to work correctly.

Does anyone have an explanation for this or have i just encountered a bug?

bakunin

disedorgue · July 3, 2017, 5:31pm

I'm not explaining, but I get the same result in bash and dash...

Regards.

Don_Cragun · July 3, 2017, 6:36pm

You are hoping the backslash in the expansion of ${chLine#?} will be treated as a literal backslash character. And, in the expansion of ${chLine} it is. But, in word in ${chLine%word} the backslash is an escape character. Since \# is treated as an escaped # in the pattern instead of the two character literal \# , there is no match and the string isn't removed from the expansion with the removal of a matching smallest suffix pattern.

This is one of the changes that was made to ksh88 behavior (that is handled differently in ksh93 ) while the POSIX shell standard was being developed.

I think this explains the difference you're seeing, but unless your script tests which version of ksh you're using and uses different code for the two cases, you may have trouble finding a common variable expansion that will get you what you want in both versions of the shell. Unfortunately, you can't use ${chLine:2} in ksh88 (but it gives you what you want in ksh93 ).

Don_Cragun · July 4, 2017, 12:25am

Maybe you'd like to try the following which should work with ksh88 and any POSIX conforming shell (including ksh93 ):

#! /bin/ksh

typeset chLine=""
typeset chChar=""
typeset pattern='??????????'

while read -r chLine
do	print - "-- Begin Line: $chLine"
	while [ ${#chLine} -gt ${#pattern} ]
	do	pattern="$pattern$pattern"
	done
	while [ -n "$chLine" ]
	do	chChar="${chLine%$(printf '%*.*s' $((${#chLine} - 1)) \
		    $((${#chLine} - 1)) "$pattern")}"
		chLine="${chLine#?}"

		print - "\n  Line: \"$chLine\""
		print - "  Char: \"$chChar\""
	done
done < /iput/file

exit 0

If /iput/file contains:

item=value
item=val\#ue
[{(This is line 3*)}]\.

the above code should produce the output:

-- Begin Line: item=value

  Line: "tem=value"
  Char: "i"

  Line: "em=value"
  Char: "t"

  Line: "m=value"
  Char: "e"

  Line: "=value"
  Char: "m"

  Line: "value"
  Char: "="

  Line: "alue"
  Char: "v"

  Line: "lue"
  Char: "a"

  Line: "ue"
  Char: "l"

  Line: "e"
  Char: "u"

  Line: ""
  Char: "e"
-- Begin Line: item=val\#ue

  Line: "tem=val\#ue"
  Char: "i"

  Line: "em=val\#ue"
  Char: "t"

  Line: "m=val\#ue"
  Char: "e"

  Line: "=val\#ue"
  Char: "m"

  Line: "val\#ue"
  Char: "="

  Line: "al\#ue"
  Char: "v"

  Line: "l\#ue"
  Char: "a"

  Line: "\#ue"
  Char: "l"

  Line: "#ue"
  Char: "\"

  Line: "ue"
  Char: "#"

  Line: "e"
  Char: "u"

  Line: ""
  Char: "e"
-- Begin Line: [{(This is line 3*)}]\.

  Line: "{(This is line 3*)}]\."
  Char: "["

  Line: "(This is line 3*)}]\."
  Char: "{"

  Line: "This is line 3*)}]\."
  Char: "("

  Line: "his is line 3*)}]\."
  Char: "T"

  Line: "is is line 3*)}]\."
  Char: "h"

  Line: "s is line 3*)}]\."
  Char: "i"

  Line: " is line 3*)}]\."
  Char: "s"

  Line: "is line 3*)}]\."
  Char: " "

  Line: "s line 3*)}]\."
  Char: "i"

  Line: " line 3*)}]\."
  Char: "s"

  Line: "line 3*)}]\."
  Char: " "

  Line: "ine 3*)}]\."
  Char: "l"

  Line: "ne 3*)}]\."
  Char: "i"

  Line: "e 3*)}]\."
  Char: "n"

  Line: " 3*)}]\."
  Char: "e"

  Line: "3*)}]\."
  Char: " "

  Line: "*)}]\."
  Char: "3"

  Line: ")}]\."
  Char: "*"

  Line: "}]\."
  Char: ")"

  Line: "]\."
  Char: "}"

  Line: "\."
  Char: "]"

  Line: "."
  Char: "\"

  Line: ""
  Char: "."

bakunin · July 4, 2017, 1:54am

Hats off to your debugging skills, Don!

After this i finally understood what the problem was in first place. Thank you for enlightening me.

Thank you again for providing even a solution along with the explanation. I was already (reluctantly) dusting off my trusted old C compiler to write the parser there. I am indebted.

bakunin