What's the difference between \d , [:digit:], and [0-9] in regular expression ?

Hello,

[river@localhost ate]$ [[ "123" =~ \d ]] && echo "ok" || echo "error";
error
[river@localhost ate]$ [[ "123" =~ [:digit:] ]] && echo "ok" || echo "error";
error
[river@localhost ate]$ [[ "123" =~ [0-9] ]] && echo "ok" || echo "error";
ok
[river@localhost ate]$ 

It seems that \d , [:digit:], and [0-9] are not the same.According to the regular expression reference, \d , [:digit:], and [0-9] have the same meaning, which represent a digit, but why not them work on linux?

[river@localhost ate]$ [[ "123" =~ \b[0-9]{3}\b ]] && echo "ok" || echo "error";
error

I am very puzzled for the above, "123" should match \b[0-9]{3}\b, but why it not ?
Thanks!

Different languages implement regular expressions differently, you should check the manual pages of your shell.

This is bash:

4.1.10(4)-release$ [[ 123 =~ [0-9] ]] && echo ok || echo ko
ok
4.1.10(4)-release$ [[ 123 =~ [[:digit:]] ]] && echo ok || echo ko
ok

This is Perl:

4.1.10(4)-release$ perl -le'print 123 =~ /\d/ ? ok : ko," <-> ", a =~ /\d/ ? ok : ko'
ok <-> ko

Which shell and operating system are you using?

1 Like

Thanks ,my OS is fedora 15 . shell type is : sh-4.2

---------- Post updated at 08:19 PM ---------- Previous update was at 08:13 PM ----------

Thanks , as [0-9] works, but why "\b[0-9]{3}\b" does not work ?

[river@localhost ate]$ [[ "123" =~ \b[0-9]{3}\b ]] && echo "ok" || echo "error";
error

First of all there is no such shell type as 'sh-4.2'. You are probably using the Bash shell if you are on fedora 15. To find out the version of bash shell you are using do:

$ echo $BASH_VERSION
4.2.10(1) - release

because (1) \b is unsupported in Bash regular expressions and (2) even if it were supported, your RE would be incorrect.

As other have pointed out, there are different "families" of regular expressions. Some of the more common of these are:

  • BRE Basic Regular Expressions.
  • ERE Extended Regular Expressions

Perl, Korn Shell 93, Python, XSLT and more support additional RE functionality.

Just because you read it on a website or in a book, does not mean that that particular RE example will work in the bash shell.

By the way, \b is a GNU extension available in glibc's regcomp(), but not required by POSIX. All the mainstream shells that I am aware of do their own RE handling and do not depend on library functions such as regcomp/regexec or the older regcmp/regex.

Also, range expressions (e.g. [0-9]) are only defined for the C/POSIX locale.

Regards,
Alister

Thanks, what represents for word boundary in the bash shell if it isn't "\b" ?

---------- Post updated at 03:12 AM ---------- Previous update was at 03:04 AM ----------

Thanks all!
The following works ok in bash shell:

[river@localhost ~]$ [[ "123" =~ [[:digit:]]{3} ]] && echo "ok" || echo "error"
ok
[river@localhost ~]$ [[ "123" =~ [0-9]{3} ]] && echo "ok" || echo "error"
ok

The following is what I want:

[river@localhost ~]$ reg='\b[0-9]{3}\b'
[river@localhost ~]$ [[ "123" =~ $reg ]] && echo "ok" || echo "error"
ok

However, why I must put the expression in a variable .

[river@localhost ~]$ [[ "123" =~ \b[0-9]{3}\b ]] && echo "ok" || echo "error"
error

Different bash versions on different platforms yield different results.
I'm not sure why with certain versions on some platforms it seems to work when I quote the escape sequences:

$ cat /etc/redhat-release
Red Hat Enterprise Linux Server release 5.1 (Tikanga)
$ echo $BASH_VERSION
3.1.17(1)-release
$ [[ "123" =~ \b[0-9]{3}\b ]] && echo "ok" || echo "error"
error
$ [[ "123" =~ "\b[0-9]{3}\b" ]] && echo "ok" || echo "error"
ok
$ [[ "123" =~ \\b[0-9]{3}\\b ]] && echo "ok" || echo "error"
ok
4.2.8(1)-release$ lsb_release -d
Description:    Ubuntu 11.04
4.2.8(1)-release$ [[ "123" =~ \b[0-9]{3}\b ]] && echo "ok" || echo "error"
error
4.2.8(1)-release$ [[ "123" =~ "\b[0-9]{3}\b" ]] && echo "ok" || echo "error"
error

On fedora 15 :

[river@localhost Desktop]$ [[ "123" =~ \\b[0-9]{3}\\b ]] && echo "ok" || echo "error"
error
[river@localhost Desktop]$  [[ "123" =~ "\b[0-9]{3}\b" ]] && echo "ok" || echo "error"
error

Word boundaries could be marked also by \< and \> ,
did you try that syntax?

4.1.10(4)-release$ reg='\<[0-9]{3}\>'
4.1.10(4)-release$ [[ "123" =~ $reg ]] && echo "ok" || echo "error"
ok