regex question using egrep

Hi, i have a a bunch of directories that are always named with six lowercase alpha's and either one or two numeric's (but no more)

so for example names could be

qwerty1
qwerty9
qwerty10
qwerty67

I am currently using two pattern matches to capture these names

 
echo $DIR | egrep '/[a-z][a-z][a-z][a-z][a-z][a-z][1-9]'
echo $DIR | egrep '/[a-z][a-z][a-z][a-z][a-z][a-z][1-9][0-9]/'

so effectively I am having to create two lines to match both scenarios.

i could use

 
echo $DIR | egrep '/[a-z][a-z][a-z][a-z][a-z][a-z][0-9]+'

but we have specific requirements NOT to match directories with 3 or more numbers at the end, so I cant use this

according to my regex cheat sheet, i should be able to do this ...

 
echo $DIR | egrep '/[a-z][a-z][a-z][a-z][a-z][a-z][0-9]{1,2}'

but this doesnt work :wall:

any ideas would be greatly appreciated

You can use ? to specify "one or none", but what you really need are anchors:

echo $DIR | egrep '^[a-z]{6}[0-9]{1,2}$'

Because something like

echo $DIR | egrep '[a-z]{6}[0-9]{1,2}'

will match 'qwertyuiio01234' also, since it does contain 6 alphas and 1digit:

this should work...

ls -latr | egrep "[aA-zZ]+([0-9])?[0-9]$"

Considering your spec you should be aware that:

~/$ echo abc-123| egrep '[a-z]{3}-[0-9]{1,2}'
abc-123

Wheras

~/$ echo abc-123| egrep '^[a-z]{3}-[0-9]{1,2}$'
~/$

And which grep are you running?

~/$ egrep --version
egrep (GNU grep) 2.5.1

ok,I am testing this for use in sudo and basically made the foolish assumption that because sudo works for things like

/[a-z][a-z][a-z][a-z][a-z][a-z][1-9]/

that it would work for richer regexes ... well it doesnt .. it seems sudo's regex support is rudimentary at best

I got your recommendations woorking from the command line using egrep though, so thanks for all your help

just gutted that my sudo rules have to look so ugly :frowning:

That bracket expression is quite odd and almost certainly incorrect. The a and Z are each included twice, once within the range expression and once without. Since range expressions are undefined outside of the POSIX locale, it's safe to assume that this is intended to run in that locale. A-z in the POSIX locale, aside from including all of the upper case and lower case letters in the English alphabet, also includes a few other characters: <left-square-bracket>, <backslash>, <right-square-bracket>, <circumflex>, <underscore>, <grave-accent>.

Why are there six characters located between the upper and lower case alphabets? They pad the beginning of the lowercase alphabet so that it's exactly 32 positions above the beginning of the uppercase alphabet. Simply flipping a single bit is then sufficient to convert between upper and lower case.

POSIX locale collation sequence:
http://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1\_chap07.html\#tag\_07\_03\_02_06

I suggest sticking with [A-Za-z] if a POSIX range expression is desired or the [[:alpha:]] character class for any locale.

Regards,
Alister

1 Like

What does sudo have to do with regular expressions? It is the tool, egrep, that processes the regexps, and unless you have some weird setup that egrep for root user is different than for a regular user, than it shouldn't make a difference.

I meant [a-zA-Z] but made a typo...thanks for looking at it finely.

thanks for all your help guys