Hi
Consider the file
this is a good line
when running
grep '\b(good|great|excellent)\b' file5
I expect it to match the line but it doesn't... what am i doing wrong??
(ultimately this regex will be in a awk script- just using grep to test it)
Thanks,
Storms
agama
May 25, 2012, 7:19pm
2
For grep to work with regular expressions you need to enable it (preferred) or use egrep:
grep -E "(good|great|excellent)" filename
1 Like
sorry for my denseness but how can i get it to work in the awk script?? the following doesnt seem to match the line
if ($0 ~ /^.*\b(good|two|three)\b.*$/) { print "match" }
agama
May 25, 2012, 8:25pm
4
The \b
escape pattern doesn't work in my version of awk. I prefer match() to the ~ syntax, but either should work:
awk '
{
if( $0 ~ /[[:space:]](foo|bar|goo)[[:space:]]/ )
print "" $0;
if( match( $0, "[[:space:]](foo|bar|goo)[[:space:]]" ) )
print;
}
'
Note that the leading ^.*
and trailing .*$
are unneeded. The leading space imples that none of these words can be at the beginning of the line, while the trailing space imples that they may not be the last word on the line. If you need either change to something like:
if( match( $0, "[[:space:]]*(foo|bar|goo)[[:space:]]*" ) )
to indicate that zero or more space characters may precede/follow the word.
1 Like
thanks for that, after your reply i did some further googling and found that \y works in place of \b in awk. I'm using this to match whole words...
so it matches good, but not goodd
if (match($0, /\y(good|excellent|three)\y/)) { print "match", $0 }
grep works with regular expressions (BRE) by default. Did you mean extended regular expressions (ERE) that support alternation (|) and enabling with the "-E" switch?
agama:
[..]If you need either change to something like:
if( match( $0, "[[:space:]]*(foo|bar|goo)[[:space:]]*" ) )
to indicate that zero or more space characters may precede/follow the word.
That will not fly, since "may" allows too much liberty. A word like "goods" would match too. And what about punctuation? What constitutes a word?
storms:
thanks for that, after your reply i did some further googling and found that \y works in place of \b in awk. I'm using this to match whole words...
so it matches good, but not goodd
if (match($0, /\y(good|excellent|three)\y/)) { print "match", $0 }
\y is a GNU extension and will not work across awks. An alternative would be to use \< and \> instead:
gawk '/\<(good|excellent|three)\>/{ print "match", $0 }'
But this isn't universal either
A universal awk approach would be something like this I guess:
awk -F'[[:space:][:punct:]]*' '{for(i=1;i<=NF;i++)if($i~/^(good|great|excellent)$/){print; next}}'
A special case would perhaps need to be made for the underscore character...
1 Like