I'm in the middle of a script and I'm doing some checks with REGEX (i.e. using the '[[' ).
I'm wondering if this example is correct or if its just a coincidence. But I thought that if I did not use the "shopt -s nocasematch"
that at least the first one should print "FALSE" but it prints "TRUE"..?
For Example:
#!/bin/bash
MY_VAR="HELLO"
### This prints "TRUE"
PATTERN_1="^[a-z]*"
if [[ $MY_VAR =~ $PATTERN_1 ]]
then
echo "TRUE"
else
echo "FALSE"
fi
echo "-------------------------"
### This prints "FALSE"
PATTERN_2="^[A-z]*"
if [[ $MY_VAR =~ $PATTERN_2 ]]
then
echo "TRUE"
else
echo "FALSE"
fi
echo "-------------------------"
### This prints "TRUE"
PATTERN_3="[a-Z]*"
if [[ $MY_VAR =~ $PATTERN_3 ]]
then
echo "TRUE"
else
echo "FALSE"
fi
I remember being told before that the pattern "[A-z]" is NOT the same as doing "[A-Za-z]" like it would be in Perl...
So I'm wondering why the pattern "[a-Z]", which is the last if statement in the code above, returns "TRUE", when
the 2nd if statement above "[A-z]" returns "FALSE"...?
I tried changing the Variable "$MY_VAR" from all upper case to all lowercase, but I still get the same output...
And lastly, if I include the "shopt -s nocasematch" they all return "TRUE"...
If anyone has any thoughts/suggestions that would be great!
I tested you code in bash version(4.1.10(4)) and with shell option(nocasematch) set or not set(i.e. shopt -p) it prints 'TRUE' and the reason is, at least the way i understand it is because the '*' means 0 or more matches.
Anyway, I would recommend using one of the POSIX Character Classes:
[[:alpha:]] matches alphabetic characters. This is equivalent to A-Za-z.
[[:lower:]] matches lowercase alphabetic characters. This is equivalent to a-z.
[[:upper:]] matches uppercase alphabetic characters. This is equivalent to A-Z.
Assuming you're running on a system with a code set based on ASCII (i.e., not an IBM or Amdahl [if you remember them] mainframe); then [a-z] is a range expression that matches the 26 lowercase alphabetic characters; [A-z] is a range expression that matches the 52 uppercase and lowercase alphabetic characters and the \ , ^ , _ , and ` characters; and [a-Z] is a range expression that is either treated as an error or as a request to match the empty set (depending on your implementation) because a follows Z in ASCII.
Sorry, I probably should have mentioned what I'm trying to do. Duhh, sorry about that...
Basically, I'm trying to "verify" some user input in the script. The user should enter some text. Then I check that text in the script to
make sure that the user's input "BEGINS" with an ALL lowercase string. I'll give the "[:lower:]" Character Class a try.
Maybe that will work...
Hey Don Cragun, thanks for your reply.
Is this the info your talking about, for what character encoding I'm using..? Also, the second one below I ran the "file" command
on one of my 'test' scripts to see what its encoding was...
Also, your saying the "[A-z]" range should work? I thought that everytime I tried using that, it would always, no matter the input,
would return "true" or "False", I forget exactly what the return value was. But I do remember that it always had the same
result everytime...
Basically, I just want to make sure that the entire "first" string that the user enters is in all lowercase...
And I'm just VERY confused why if the input string is "HELLO" (all uppercase) and the following test (below) is returning TRUE...??
#!/bin/bash
MY_VAR="HELLO"
### This pattern SHOULD match a string that begins with ONLY "lowercase letters", zero or more times...
PATTERN_1="^[a-z]*"
### This prints "TRUE"
if [[ $MY_VAR =~ $PATTERN_1 ]]
then
echo "TRUE"
else
echo "FALSE"
fi
Any idea why I'm getting "TRUE" when the input is ALL uppercase letters..?
I think the reason I couldn't get that "[:lower:]" character class to work was because I didn't enclose it in another set of square
brackets... Seems to work to a degree..
I'm just still baffled why the pattern "[a-z]*" matches the string "HELLO" when they are ALL uppercase....