awk and POSIX character class

can anyone tell me why this doesn't work? I've been trying to play with character classes and I seem to be missing something here..!

echo "./comparecdna.summary" | awk '/^[.][/]compare[[:alnum:]+][.]summary$/' # returns nothing
echo "./compare_cdna.summary" | awk '/^[.][/]compare_[[:alnum:]+][.]summary$/' # returns nothing
echo "./compare_cdna_peptide.summary" | awk '/^[.][/]compare_[[_[:alnum:]]+][.]summary$/' # returns nothing
echo "./compare_cdna_peptide.summary" | awk '/^[.][/]compare_[[:word:]+][.]summary$/' # returns: awk: fatal: Invalid character class name: /^[.][/]compare_[[:word:]+][.]summary$/

Thanks!!
Anthony

In reference to your first example:

$ echo "./comparecdna.summary" | awk '/^[.][\/]compare[[:alnum:]]+[.]summary$/'
./comparecdna.summary

1) You need to escape the the "/" within the regular expression since it's used to delimit the expression itself.
2) [[:alnum:]+] is not the same thing as [[:alnum:]]+
The former matches one character, an alphanumeric or a plus sign; the latter matches 1 or more alphanumerics.

Regards,
Alister

try this,

echo "./compare_cdna_peptide.summary" | awk '/^\.\/compare_[[:alpha:]].+\.summary$/'

thanks guys,
here's the revised version:

echo "./comparecdna.summary" | awk '/^[.][/]compare[[:alnum:]]+[.]summary$/' # returns: ./comparecdna.summary
echo "./compare_cdna.summary" | awk '/^[.][/]compare_[[:alnum:]]+[.]summary$/' # returns: ./compare_cdna.summary
echo "./compare_cdna_peptide.summary" | awk '/^[.][/]compare_[_[:alnum:]]+[.]summary$/' # returns: ./compare_cdna_peptide.summary
[:word:] is non-standard POSIX apparently

alister, the "/" seems not need be escaped here, maybe because it's by itself within the []?

pravin27, doesn't "[[:alpha:]].+" means "a letter followed by any characters once or more" ?

It is required according to the posix standard. I strongly suggest that you do not depend on your implementation's behavior. Perhaps your awk allows that, but when run on a different implementation it may choke thusly:

$ echo "./comparecdna.summary" | awk '/^[.][/]compare[[:alnum:]]+[.]summary$/'
awk: nonterminated character class ^[.][
 source line number 1
 context is
         >>> /^[.][/ <<<

AWK thinks it's an unterminated class because the regular expression delimiter is encountered.

For more info, read the "Regular Expressions" section @ awk

Regards,
Alister

1 Like

wicked, thanks that's gonna be really useful!