ksh - Get last character from string - Bad Substitution error

I want to get the last character from my machine name using the following code, the default shell is bash, the script runs in ksh.
I get 'bad' substitution error on running the script, but works fine if run using dot and space.

Why?

[ysrini@linuxapp01a bin]$ echo $0
bash

[ysrini@linuxapp01a bin]$ cat -n myenv.sh 
     1  #!/usr/bin/ksh
     2
     3  export SERVER_NAME=`hostname`
     4  echo "SERVER_NAME : ${SERVER_NAME}"
     5
     6  export SERVER_NODE=`echo ${SERVER_NAME:${#SERVER_NAME} - 1}`
     7  echo "SERVER_NODE : ${SERVER_NODE}"
     8
     9  export SERVER_ROOT_NAME=`echo "${SERVER_NAME%?}"`
    10  echo "SERVER_ROOT_NAME : ${SERVER_ROOT_NAME}"


[ysrini@linuxapp01a bin]$ myenv.sh 
SERVER_NAME : linuxapp01a
./myenv.sh[6]: : bad substitution
SERVER_NODE : 
SERVER_ROOT_NAME : linuxapp01

[ysrini@linuxapp01a bin]$ . myenv.sh 
SERVER_NAME : linuxapp01a
SERVER_NODE : a
SERVER_ROOT_NAME : linuxapp01

You need to understand how "variable expansion" works: lets assume we have a variable assigned (i suggest you try the examples yourself at the shell prompt and play around a bit with them to get familiar):

var="abc/def/ghi"

The point of shell variables is that you cannot use them directly, like in other programming languages:

x="acd"
y="def"
z=x+y
print z

This (or similar constructs) would work in other languages, but in the shell you use the variable "indirectly", once you have assigned it, through a set of quasi-functions. This is "variable expansion". The most basic expansion is:

${var}

which will expand to the content of the variable "var". This content is replaced at the command line and then the command line is executed. For instance:

# print - "${var}"         # your command
# print - "abc/def/ghi"    # the shell first expands the expression
abc/def/ghi                # excuting the print command

Keep in mind this mechanism when we discuss more complicated expansions. The next in the list are these:

${var#<regex>}    ${var##<regex>}
${var%<regex>}    ${var%%<regex>}

The first one ("#") takes the content of the variable, then takes the regexp, expands that and if it matches the beginning of the content, the matching part is cut off. Sounds complicated? OK, here is an example with our variable from above:

# print - "${var#?}"
bc/def/ghi

The complete content would be "abc/def/ghi". The regexp ("?") means "any one single character", which is deleted from the beginning, therefore leaving the first character out. Notice, that this DOES NOT CHANGE the variable at all:

# print - "${var#?}"
bc/def/ghi
# print - "${var}"
abc/def/ghi

The opposite of "#" is "%", which works the same, but takes away from the content at the end instead of the beginning. Also notice the "*", which means "any number of any characters". In case you wonder: yes, these are the same characters you can use as filemasks when issuing a "ls -l <mask>". Whatever you can use there you can use here:

# print - "${var%?}"
abc/def/gh
# print - "${var%/*}"
abc/def

You might wonder what the difference between "#" and "##" and "%" and "%%" respectively is. Try out the following and notice the difference:

# print - "${var%/*}"
# print - "${var%%/*}"
# print - "${var#*/}"
# print - "${var##*/}"

The one is always the shortest possible match the other the longest possible match. For matches which only occur once there is no difference.

There are a lot of other interesting and powerful expansions: you can replace one substring with another:

${var/<search>/<replace>}

and a lot of other things. Check out "man ksh" for reference.

I'd like to show you another trick: nested expansions. You can use an expansion inside another expansion, even if it uses the same variable (because - you know already - the variables content itself is not changed!). This, finally, will do what you look for:

Remember what this gives:

${var%?}

Correct: everything save for the last character. Now, let us use this as the regexp we want to take away from the beginning of the content. Obviously this will match most of the content and only leave the last character, yes?

# print - "${var#${var%?}}"
i

And because this will always be true you can use this regexp every time, regardless of what the content of "var" is, for the last characters - or more characters, if you modify it a bit:

# print - "${var#${var%??}}"       # the last 2 characters
# print - "${var%${var#??}}"       # the first 2 characters

I hope this helps.

bakunin

3 Likes

Thanks Bakunin, not only did you provide the answer but a good explanation of understanding the variable expansion. The 'solution' is always more useful than direct 'answer' !
Thanks again
-srinivas y.

You still should be careful about what shell you are using. The shebang and the thread title say ksh, but your echo $0 says bash. Though it doesn't matter here, there might be differences in other places that lead to surprising resilts.

That code is vulnerable to pattern matching metacharacters. For this approach to work with arbitrary text, it is necessary to double-quote the nested parameter expansion.

$ s=*****a
$ echo "$s"
*****a
$ echo "${s#${s%?}}"
*****a
$ echo "${s##${s%?}}"

$ echo "${s#"${s%?}"}"
a

A minor nit: You refer to shell pattern matching as regular expressions. I'm sure you know that those are two distinct grammars, but a novice may become confused.

Regards,
Alister

2 Likes

You are right and your example is legitimate. I left that part out purposefully to avoid complicating matters. I should have probably mentioned it.

Yes - and no. "regular expressions" is (in a very theoretical sense) any type-3 language in the Chomsky hierarchy: a device where some characters and some metacharacters describe a text pattern. This is the case for shell regexps (aka "file globs") as well as for "Unix Basic Regular Expressions" (what awk, grep and sed use) or "Extended Regular Expressions" (i.e. perl and some GNU variants of grep, sed, ...). These are all different flavours of Regexps (and i should have mentioned that too, probably), but still Regexps nevertheless.

You are right, though, that in UNIX environments, the term "regexp" particularily describes BREs as used in sed, awk and grep. Every other use of the term, even if technically correct, might be confusing.

bakunin

1 Like

If we are going to be precise with regard to formal language theory, then you are mistaken. Neither POSIX Basic Regular Expressions, nor the "extended" dialects implemented in perl, python, php, java, et al, are regular expressions. Any grammar that supports backreferences cannot be implemented with a [non-]deterministic finite automaton (a defining characteristic of a regular language). sh pattern matching and POSIX Extended Regular Expressions, however, are formally regular languages.

As you noted, I was simply using conventional, informal nomenclature. The ksh/bash man pages make a concerted effort to not use the term 'regular expression' when discussing pattern matching notation.

Minor nit: AWK uses POSIX Extended Regular Expressions, not Basic.

Regards,
Alister

1 Like

A note to add: POSIX grep can use ERE and so can BSD sed (both through the -E switch) and so can ksh93 and bash . And Perl uses its own form of Regular Expression, neither Extended, nor Basic. GNU utilities use extensions to both BRE and ERE. In a UNIX context term regexp does not just refer to BRE but to ERE as well (but not to pattern matching).

I agree with Alister (and with yourself ) - even when formally right in theoretical informatics lingo - it is confusing to call the pattern matching used in parameter expansion a "regular expression", since the POSIX standards consistently uses the terms "regular expression" and "pattern matching" to distinguish between the two.

Shell Command Language

Interesting nonetheless :slight_smile:

@Alister: interesting point about BRE not being formal regular language while ERE is. So then GNU ERE is not, since it supports back references...

ksh has the typeset -R option which can also do what you wish for

$ STR="Unix and Linux Forums"
$ typeset -R1 right=$STR
$ echo $right
r