Hi,
I want to validate strings in perl, the string may contains characters from a-zA-Z0-9 and symbols +-_.:/\
To validate such a string I computed a regex
if ($string =~ m/^[a-zA-Z0-9+-_.:\/\\]/) {
print "valid";
} else {
print "invalid";
}
but this regex also validates strings that contain other characters like @%^&*?.
I want to disallow strings which contain all other characters from above valid set.
plz help me
thanks,
zing_foru
if ($string =~ m{\A[-+.:/\\a-zA-Z0-9_]+\z}) {
print "valid";
} else {
print "invalid";
}
and if you have Perl 5.14+, you could replace the first line with:
if ($string =~ m{\A[-+.:/\\\w]+\z}a) {
alister
September 16, 2012, 7:05am
3
If an empty string is allowed, you could also negate the bracketed expression, rendering anchors unnecessary.
Regards,
Alister
thanks elixir_sinari, but this regex even not validating correct strings
e.g. "e:\test1\test2"
"/usr/bin/test"
"/var/log-test"
"/usr/test bin/goak"
invalid strings e.g.
"e:\test\test?newdata" #contains ? invalid symbol
"goak&<>" #contains & <> invalid symbol
msabhi
September 16, 2012, 8:04am
5
Try this Reg Exp
if($string =~ m/^[\w\s+-.:\/\\]*$/)
{
print "Valid";
}else {
print "Invalid"
}
alister
September 16, 2012, 9:27am
7
zing_foru:
thanks msabhi.. worked
No, it did not. You just did not notice the error. A comma is still allowed even though it's not on your list of allowed characters.
The portions highlighted in red represent ranges of characters, not the three characters which appear literally in the regular expression.
zing_foru:
I want to validate strings in perl, the string may contains characters from a-zA-Z0-9 and symbols +-_.:/\
/^[a-zA-Z0-9+-_.:\/\\]/
but this regex also validates strings that contain other characters like @%^&*?
msabhi:
Try this Reg Exp
/^[\w\s+-.:\/\\]*$/
Neither of those is correct. msabhi's example is simply less wrong. zing_foru's unintended range expression spans the entirety of the following list. msabhi's attempt spans the highlighted section.
From POSIX - Locale - collation sequence
<plus-sign>
<comma>
<hyphen>
<period>
<slash>
<zero>
<one>
<two>
<three>
<four>
<five>
<six>
<seven>
<eight>
<nine>
<colon>
<semicolon>
<less-than-sign>
<equals-sign>
<greater-than-sign>
<question-mark>
<commercial-at>
<A>
<B>
<C>
<D>
<E>
<F>
<G>
<H>
<I>
<J>
<K>
<L>
<M>
<N>
<O>
<P>
<Q>
<R>
<S>
<T>
<U>
<V>
<W>
<X>
<Y>
<Z>
<left-square-bracket>
<backslash>
<right-square-bracket>
<circumflex>
<underscore>
To fix your errors, refer to the perlre man page @ perldoc.perl.org
Within a list, the "-" character specifies a range, so that a-z represents all characters between "a" and "z", inclusive. If you want either "-" or "]" itself to be a member of a class, put it at the start of the list (possibly after a "^"), or escape it with a backslash. "-" is also taken literally when it is at the end of the list, just before the closing "]".
Regards,
Alister
1 Like
msabhi
September 16, 2012, 11:25am
8
Rightly pointed out at Alister. I just overlooked "-" or somehow din't stress much while giving solution.."-" got a special meaning inside the character class just like "\","]" and "^"(only if used in the beginning)...Now here i believe escaping "-" should work...
if($string =~ m/^[\w\s+\-.:\/\\]*$/)
{ print "Valid";
}else {
print "Invalid"
}
Correct me if am wrong...
zing_foru:
thanks elixir_sinari, but this regex even not validating correct strings
e.g. "e:\test1\test2"
"/usr/bin/test"
"/var/log-test"
"/usr/test bin/goak"
invalid strings e.g.
"e:\test\test?newdata" #contains ? invalid symbol
"goak&<>" #contains & <> invalid symbol
Have you checked my post again? I've edited some things :).
---------- Post updated at 10:34 AM ---------- Previous update was at 10:28 AM ----------
msabhi:
Rightly pointed out at Alister. I just overlooked "-" or somehow din't stress much while giving solution.."-" got a special meaning inside the character class just like "\","]" and "^"(only if used in the beginning)...Now here i believe escaping "-" should work...
if($string =~ m/^[\w\s+\-.:\/\\]*$/)
{ print "Valid";
}else {
print "Invalid"
}
Correct me if am wrong...
\w and \s, in Unicode context, would match a lot many unintended characters if these occur in the input.