hergp
August 4, 2011, 4:03am
1
In the kornshell you can get the length of a string with
$ x=abc
$ print ${#x}
3
If the current locale is a multibyte locale, like de_AT.UTF-8, you get the length of the string in bytes, not characters:
$ x=f�r
$ print ${#x}
4
Is there an easy way to get the length of a string in characters instead of bytes?
Which OS and which version of the Korn Shell are you using?
% print $KSH_VERSION
@(#)MIRBSD KSH R40 2011/07/16
% print "$x" ${#x}
f�r 3
hergp
August 4, 2011, 4:34am
3
I use ksh88 and serveral ksh93 versions on Solaris and OpenSolaris.
The problem in my case seems to be, that ${#varname } counts bytes per design and not characters. If you are in a single byte locale, like iso-8859, then everything is fine, because every character uses exactly one byte.
Get me right, I don't think, this is a bug. I am just asking, if someone is aware of a construct in kornshell 88 or at least 93, that counts characters in every scenario.
yazu
August 4, 2011, 4:44am
4
I believe it's impossible in ksh88 and in ksh93 it may depend on the version:
$ ksh --version
version sh (AT&T Research) 93t+ 2010-06-21
$ a=''
$ echo ${#a}
4
For different systems I think it's better to use something else. Perl 5.8 or later is a good choice.
hergp:
I use ksh88 and serveral ksh93 versions on Solaris and OpenSolaris.
The problem in my case seems to be, that ${#varname } counts bytes per design and not characters. If you are in a single byte locale, like iso-8859, then everything is fine, because every character uses exactly one byte.
Get me right, I don't think, this is a bug. I am just asking, if someone is aware of a construct in kornshell 88 or at least 93, that counts characters in every scenario.
I understand, I just cannot reproduce it:
$ locale
LANG=
LC_CTYPE="it_IT.UTF-8@euro"
LC_NUMERIC="it_IT.UTF-8@euro"
LC_TIME="it_IT.UTF-8@euro"
LC_COLLATE="it_IT.UTF-8@euro"
LC_MONETARY="it_IT.UTF-8@euro"
LC_MESSAGES="it_IT.UTF-8@euro"
LC_ALL=it_IT.UTF-8@euro
$ uname -sr
SunOS 5.8
$ Version M-11/16/88i
$ print "$x" ${#x}
f�r 3
P.S. Hm ...
$ locale
LANG=
LC_CTYPE="en_US.UTF-8"
LC_NUMERIC="en_US.UTF-8"
LC_TIME="en_US.UTF-8"
LC_COLLATE="en_US.UTF-8"
LC_MONETARY="en_US.UTF-8"
LC_MESSAGES="en_US.UTF-8"
LC_ALL=en_US.UTF-8
$ print ${.sh.version} "$x" ${#x}
Version M-12/28/93d f�r 2
hergp
August 4, 2011, 5:04am
6
$ locale
LANG=
LC_CTYPE="C"
LC_NUMERIC="C"
LC_TIME="C"
LC_COLLATE="C"
LC_MONETARY="C"
LC_MESSAGES="C"
LC_ALL=de_AT.UTF-8
$ uname -sr
SunOS 5.11
$ echo $KSH_VERSION
Version JMP 93t+ 2009-10-12
$ print "$x" ${#x}
f�r 4
Terminal program (putty) configured to use UTF-8 too, of course.
yazu
August 4, 2011, 5:12am
7
$ a=''
$ echo ${#a}
4
$ LANG=
$ echo ${#a}
12
hergp
August 4, 2011, 5:14am
8
Hmm, think I found the solution. The de_AT.UTF-8 locale is not installed on my system, only de_DE.UTF-8 (wondering, why the shell accepted setting LC_ALL to a not installed locale).
$ export LC_ALL=de_DE.UTF-8
$ locale
LANG=
LC_CTYPE="de_DE.UTF-8"
LC_NUMERIC="de_DE.UTF-8"
LC_TIME="de_DE.UTF-8"
LC_COLLATE="de_DE.UTF-8"
LC_MONETARY="de_DE.UTF-8"
LC_MESSAGES="de_DE.UTF-8"
LC_ALL=de_DE.UTF-8
$ x=f�r
$ print $x ${#x}
f�r 3
... seems, it was my fault to set the wrong locale without noticing.
Thanks for your time and support anyway.
OK, glad you've found a solution.