Grep with Regex multiple characters

Lost_in_Cyberia · April 15, 2016, 7:34pm

Hi everyone,

So I'm a bit confused about a regex pattern that should exist, but I can't really find any way to do it...

Let's say I want to match any lines that have a specific string, but I don't know the order of the letters but I know the length. Let's say it's 10 characters and begins with a V

Do I seriously have to do something like:

egrep '^N [a-z][a-z][a-z][a-z][a-z][a-z][a-z][a-z][a-z]'

It could get worse if I knew there was a number in there somewhere... Is there no way to shorten this? I know there is a quantifier that I can add, something like

\{x,y\} but I've used it and it doesn't seem to work with letters?

Is there anyway to sum up saying that you're looking for 9 letters each could be a-z?

rdrtx1 · April 15, 2016, 7:55pm

try:

grep "^V[a-zA-Z]\{9\}\b" infile

Lost_in_Cyberia · April 15, 2016, 8:23pm

Thanks for the quick response on a friday! Okay..so the syntax you provided sort of worked. It did return the specified value of 9...but it also returned any line that had more than 9. It basically used 9 as the minimum but it had no limit.

Also... I've been so used to using egrep...why on earth can't egrep do this, but regular grep can??

Scrutinizer · April 16, 2016, 12:03am

Whether your grep supports \b depends on your implementation. GNU grep can, but regular grep probably not.

\b will only work here if the character that follows is a non-word character (so if it is number or an underscore then \b will not work)..
Also [a-zA-Z] is unreliable in some locale and it does not work for diacritical characters, so it is better to use the Posix [[:alpha:]] character class instead...
egrep (or grep -E as is preferred nowadays) can also use iteration but you need to leave out the escapes ( \ ) before the curly braces

So combining these remarks, try:

grep -E '^V[[:alpha:]]{9}([^[:alpha:]]|$)' file

--
grep -E is the same as grep for a POSIX compliant grep.