regex question using egrep

rethink · July 18, 2011, 10:12am

Hi, i have a a bunch of directories that are always named with six lowercase alpha's and either one or two numeric's (but no more)

so for example names could be

qwerty1
qwerty9
qwerty10
qwerty67

I am currently using two pattern matches to capture these names

 
echo $DIR | egrep '/[a-z][a-z][a-z][a-z][a-z][a-z][1-9]'
echo $DIR | egrep '/[a-z][a-z][a-z][a-z][a-z][a-z][1-9][0-9]/'

so effectively I am having to create two lines to match both scenarios.

i could use

 
echo $DIR | egrep '/[a-z][a-z][a-z][a-z][a-z][a-z][0-9]+'

but we have specific requirements NOT to match directories with 3 or more numbers at the end, so I cant use this

according to my regex cheat sheet, i should be able to do this ...

 
echo $DIR | egrep '/[a-z][a-z][a-z][a-z][a-z][a-z][0-9]{1,2}'

but this doesnt work :wall:

any ideas would be greatly appreciated

mirni · July 18, 2011, 10:30am

You can use ? to specify "one or none", but what you really need are anchors:

echo $DIR | egrep '^[a-z]{6}[0-9]{1,2}$'

Because something like

echo $DIR | egrep '[a-z]{6}[0-9]{1,2}'

will match 'qwertyuiio01234' also, since it does contain 6 alphas and 1digit:

shamrock · July 18, 2011, 10:42am

this should work...

ls -latr | egrep "[aA-zZ]+([0-9])?[0-9]$"

Skrynesaver · July 18, 2011, 10:43am

Considering your spec you should be aware that:

~/$ echo abc-123| egrep '[a-z]{3}-[0-9]{1,2}'
abc-123

Wheras

~/$ echo abc-123| egrep '^[a-z]{3}-[0-9]{1,2}$'
~/$

And which grep are you running?

~/$ egrep --version
egrep (GNU grep) 2.5.1

rethink · July 18, 2011, 1:06pm

ok,I am testing this for use in sudo and basically made the foolish assumption that because sudo works for things like

/[a-z][a-z][a-z][a-z][a-z][a-z][1-9]/

that it would work for richer regexes ... well it doesnt .. it seems sudo's regex support is rudimentary at best

I got your recommendations woorking from the command line using egrep though, so thanks for all your help

just gutted that my sudo rules have to look so ugly

alister · July 18, 2011, 2:35pm

That bracket expression is quite odd and almost certainly incorrect. The a and Z are each included twice, once within the range expression and once without. Since range expressions are undefined outside of the POSIX locale, it's safe to assume that this is intended to run in that locale. A-z in the POSIX locale, aside from including all of the upper case and lower case letters in the English alphabet, also includes a few other characters: <left-square-bracket>, <backslash>, <right-square-bracket>, <circumflex>, <underscore>, <grave-accent>.

Why are there six characters located between the upper and lower case alphabets? They pad the beginning of the lowercase alphabet so that it's exactly 32 positions above the beginning of the uppercase alphabet. Simply flipping a single bit is then sufficient to convert between upper and lower case.

POSIX locale collation sequence:
http://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1\_chap07.html\#tag\_07\_03\_02_06

I suggest sticking with [A-Za-z] if a POSIX range expression is desired or the [[:alpha:]] character class for any locale.

Regards,
Alister

mirni · July 18, 2011, 7:26pm

What does sudo have to do with regular expressions? It is the tool, egrep, that processes the regexps, and unless you have some weird setup that egrep for root user is different than for a regular user, than it shouldn't make a difference.

shamrock · July 19, 2011, 11:39am

I meant [a-zA-Z] but made a typo...thanks for looking at it finely.

rethink · July 20, 2011, 4:45am

thanks for all your help guys