perl regex help needed

zing_foru · September 16, 2012, 4:55am

Hi,

I want to validate strings in perl, the string may contains characters from a-zA-Z0-9 and symbols +-_.:/\

To validate such a string I computed a regex

if ($string =~ m/^[a-zA-Z0-9+-_.:\/\\]/) {
   print "valid";
} else {
   print "invalid";
}

but this regex also validates strings that contain other characters like @%^&*?.

I want to disallow strings which contain all other characters from above valid set.

plz help me

thanks,
zing_foru

elixir_sinari · September 16, 2012, 5:03am

if ($string =~ m{\A[-+.:/\\a-zA-Z0-9_]+\z}) {
print "valid";
} else {
print "invalid";
}

and if you have Perl 5.14+, you could replace the first line with:

if ($string =~ m{\A[-+.:/\\\w]+\z}a) {

alister · September 16, 2012, 7:05am

If an empty string is allowed, you could also negate the bracketed expression, rendering anchors unnecessary.

Regards,
Alister

zing_foru · September 16, 2012, 7:44am

thanks elixir_sinari, but this regex even not validating correct strings

e.g. "e:\test1\test2"
      "/usr/bin/test"
      "/var/log-test"
      "/usr/test bin/goak"

invalid strings e.g.

"e:\test\test?newdata"  #contains ? invalid symbol
"goak&<>"  #contains & <> invalid symbol

msabhi · September 16, 2012, 8:04am

Try this Reg Exp

if($string =~ m/^[\w\s+-.:\/\\]*$/)
{
print "Valid";
}else {
print "Invalid"
}

zing_foru · September 16, 2012, 8:25am

thanks msabhi.. worked

alister · September 16, 2012, 9:27am

No, it did not. You just did not notice the error. A comma is still allowed even though it's not on your list of allowed characters.

The portions highlighted in red represent ranges of characters, not the three characters which appear literally in the regular expression.

zing_foru:

I want to validate strings in perl, the string may contains characters from a-zA-Z0-9 and symbols +-_.:/\
/^[a-zA-Z0-9+-_.:\/\\]/
but this regex also validates strings that contain other characters like @%^&*?

Neither of those is correct. msabhi's example is simply less wrong. zing_foru's unintended range expression spans the entirety of the following list. msabhi's attempt spans the highlighted section.

From POSIX - Locale - collation sequence

<plus-sign>
<comma>
<hyphen>
<period>
<slash>
<zero>
<one>
<two>
<three>
<four>
<five>
<six>
<seven>
<eight>
<nine>
<colon>
<semicolon>
<less-than-sign>
<equals-sign>
<greater-than-sign>
<question-mark>
<commercial-at>
<A>
<B>
<C>
<D>
<E>
<F>
<G>
<H>
<I>
<J>
<K>
<L>
<M>
<N>
<O>
<P>
<Q>
<R>
<S>
<T>
<U>
<V>
<W>
<X>
<Y>
<Z>
<left-square-bracket>
<backslash>
<right-square-bracket>
<circumflex>
<underscore>

To fix your errors, refer to the perlre man page @ perldoc.perl.org

Regards,
Alister

msabhi · September 16, 2012, 11:25am

Rightly pointed out at Alister. I just overlooked "-" or somehow din't stress much while giving solution.."-" got a special meaning inside the character class just like "\","]" and "^"(only if used in the beginning)...Now here i believe escaping "-" should work...

if($string =~ m/^[\w\s+\-.:\/\\]*$/) 
{ print "Valid";
 }else { 
print "Invalid" 
}

Correct me if am wrong...

elixir_sinari · September 16, 2012, 11:34am

zing_foru:

thanks elixir_sinari, but this regex even not validating correct strings
e.g. "e:\test1\test2"
   "/usr/bin/test"
   "/var/log-test"
   "/usr/test bin/goak"
invalid strings e.g.
"e:\test\test?newdata"  #contains ? invalid symbol
"goak&<>"  #contains & <> invalid symbol

Have you checked my post again? I've edited some things :).

---------- Post updated at 10:34 AM ---------- Previous update was at 10:28 AM ----------

msabhi:

Rightly pointed out at Alister. I just overlooked "-" or somehow din't stress much while giving solution.."-" got a special meaning inside the character class just like "\","]" and "^"(only if used in the beginning)...Now here i believe escaping "-" should work...
if($string =~ m/^[\w\s+\-.:\/\\]*$/) 
{ print "Valid";
 }else { 
print "Invalid" 
}
Correct me if am wrong...

\w and \s, in Unicode context, would match a lot many unintended characters if these occur in the input.