regular expression matching whole words

Hi

Consider the file

this is a good line

when running

grep '\b(good|great|excellent)\b' file5

I expect it to match the line but it doesn't... what am i doing wrong??
(ultimately this regex will be in a awk script- just using grep to test it)

Thanks,

Storms

For grep to work with regular expressions you need to enable it (preferred) or use egrep:

grep -E "(good|great|excellent)" filename
1 Like

sorry for my denseness but how can i get it to work in the awk script?? the following doesnt seem to match the line

if ($0 ~ /^.*\b(good|two|three)\b.*$/) { print "match" }

The \b escape pattern doesn't work in my version of awk. I prefer match() to the ~ syntax, but either should work:

awk '
    {
        if( $0 ~ /[[:space:]](foo|bar|goo)[[:space:]]/ )
            print "" $0;

        if( match( $0, "[[:space:]](foo|bar|goo)[[:space:]]" ) )
            print;
    }
'

Note that the leading ^.* and trailing .*$ are unneeded. The leading space imples that none of these words can be at the beginning of the line, while the trailing space imples that they may not be the last word on the line. If you need either change to something like:

if( match( $0, "[[:space:]]*(foo|bar|goo)[[:space:]]*" ) )

to indicate that zero or more space characters may precede/follow the word.

1 Like

thanks for that, after your reply i did some further googling and found that \y works in place of \b in awk. I'm using this to match whole words...

so it matches good, but not goodd

if (match($0, /\y(good|excellent|three)\y/)) { print "match", $0 }

grep works with regular expressions (BRE) by default. Did you mean extended regular expressions (ERE) that support alternation (|) and enabling with the "-E" switch?

That will not fly, since "may" allows too much liberty. A word like "goods" would match too. And what about punctuation? What constitutes a word?

\y is a GNU extension and will not work across awks. An alternative would be to use \< and \> instead:

gawk '/\<(good|excellent|three)\>/{ print "match", $0 }'

But this isn't universal either

A universal awk approach would be something like this I guess:

awk -F'[[:space:][:punct:]]*' '{for(i=1;i<=NF;i++)if($i~/^(good|great|excellent)$/){print; next}}'

A special case would perhaps need to be made for the underscore character...

1 Like