[SOLVED] String length in kornshell

In the kornshell you can get the length of a string with

$ x=abc
$ print ${#x}
3

If the current locale is a multibyte locale, like de_AT.UTF-8, you get the length of the string in bytes, not characters:

$ x=f�r
$ print ${#x}
4

Is there an easy way to get the length of a string in characters instead of bytes?

Which OS and which version of the Korn Shell are you using?

% print $KSH_VERSION 
@(#)MIRBSD KSH R40 2011/07/16
% print "$x" ${#x}                                                                                                          
f�r 3

I use ksh88 and serveral ksh93 versions on Solaris and OpenSolaris.

The problem in my case seems to be, that ${#varname} counts bytes per design and not characters. If you are in a single byte locale, like iso-8859, then everything is fine, because every character uses exactly one byte.

Get me right, I don't think, this is a bug. I am just asking, if someone is aware of a construct in kornshell 88 or at least 93, that counts characters in every scenario.

I believe it's impossible in ksh88 and in ksh93 it may depend on the version:

$ ksh --version
  version         sh (AT&T Research) 93t+ 2010-06-21
$ a=''
$ echo ${#a}
4

For different systems I think it's better to use something else. Perl 5.8 or later is a good choice.

I understand, I just cannot reproduce it:

$ locale
LANG=
LC_CTYPE="it_IT.UTF-8@euro"
LC_NUMERIC="it_IT.UTF-8@euro"
LC_TIME="it_IT.UTF-8@euro"
LC_COLLATE="it_IT.UTF-8@euro"
LC_MONETARY="it_IT.UTF-8@euro"
LC_MESSAGES="it_IT.UTF-8@euro"
LC_ALL=it_IT.UTF-8@euro
$ uname -sr
SunOS 5.8
$ Version M-11/16/88i
$ print "$x" ${#x}
f�r 3

P.S. Hm ...

$ locale
LANG=
LC_CTYPE="en_US.UTF-8"
LC_NUMERIC="en_US.UTF-8"
LC_TIME="en_US.UTF-8"
LC_COLLATE="en_US.UTF-8"
LC_MONETARY="en_US.UTF-8"
LC_MESSAGES="en_US.UTF-8"
LC_ALL=en_US.UTF-8
$ print ${.sh.version} "$x" ${#x}
Version M-12/28/93d f�r 2
$ locale
LANG=
LC_CTYPE="C"
LC_NUMERIC="C"
LC_TIME="C"
LC_COLLATE="C"
LC_MONETARY="C"
LC_MESSAGES="C"
LC_ALL=de_AT.UTF-8
$ uname -sr
SunOS 5.11
$ echo $KSH_VERSION
Version JMP 93t+ 2009-10-12
$ print "$x" ${#x}
f�r 4

Terminal program (putty) configured to use UTF-8 too, of course.

$ a='' 
$ echo ${#a}  
4
$ LANG=
$ echo ${#a}
12

Hmm, think I found the solution. The de_AT.UTF-8 locale is not installed on my system, only de_DE.UTF-8 (wondering, why the shell accepted setting LC_ALL to a not installed locale).

$ export LC_ALL=de_DE.UTF-8
$ locale
LANG=
LC_CTYPE="de_DE.UTF-8"
LC_NUMERIC="de_DE.UTF-8"
LC_TIME="de_DE.UTF-8"
LC_COLLATE="de_DE.UTF-8"
LC_MONETARY="de_DE.UTF-8"
LC_MESSAGES="de_DE.UTF-8"
LC_ALL=de_DE.UTF-8
$ x=f�r
$ print $x ${#x}
f�r 3

... seems, it was my fault to set the wrong locale without noticing.

Thanks for your time and support anyway.

OK, glad you've found a solution.