Awk expressions working & not working

vibhor_agarwali · October 12, 2011, 4:40am

Hi,

Putting across a few awk expressions.
Apart from the last, all of them are working.

echo a/b/c | awk -F'/b/c$' '{print $1}'
a

echo a/b/c++ | awk -F'/b/c++' '{print $1}'
a

echo a/b/c++ | awk -F'/b/c++$' '{print $1}'
a/b/c++

Request thoughts on why putting a '$' post double ++ changed everything.
I require a $ as interested in only the last string.

Do let me know workarounds to accomplish the same.
Thanks in advance

radoulov · October 12, 2011, 6:27am

In certain positions the + sign is special (one or more occurrences of the previous character):

% echo a/b/c++ | awk -F'/b/c\\++$' '{print $1}'
a

vibhor_agarwali · October 12, 2011, 6:35am

Yes thought of the same but couldn't fit that logic here.

If + is working based on its logic of more occurrences, it shouldn't have worked without the $.
Wondering what effect has putting the $ has?

radoulov · October 12, 2011, 7:02am

/b/c++ matches / - b - / - c (one or more occurrences) - + (this one seems to be ignored)

So:

% echo a/b/c+ | awk -F'/b/c++' '{print $1, $2}'
a +

% echo a/b/cccc+ | awk -F'/b/c++' '{print $1, $2}'
a +

/b/c++$ matches / - b - / - c (one or more occurrences) - + (ignored) at the end of the string ($):

These two should be clear:

% echo a/b/c++ |
  awk -F'/b/c++$' '{
    print "NF:", NF
    print "$1:", $1
    print "$2:", $2
    }'
NF: 1
$1: a/b/c++
$2: 
% echo a/b/c++ |
  awk -F'/b/c++' '{ 
    print "NF:", NF
    print "$1:", $1
    print "$2:", $2
    }'         
NF: 2
$1: a
$2: ++

Here the pattern (FS) is not found:

% echo a/b/c++ | 
  awk -F'/b/c++$' '{
    print "NF:", NF
    print "$1:", $1
    print "$2:", $2
    }'                            
NF: 1
$1: a/b/c++
$2:

I suppose that the behaviour is unspecified for more than one consecutive quantifiers (+ signs), it siply ignores the second + sign:

% echo a/b/c |  
  awk -F'/b/c++$' '{
    print "NF:", NF
    print "$1:", $1
    print "$2:", $2
    }'                          
NF: 2
$1: a
$2:

Here only the last a/b/c[c..] matches, because it's at the end of the record:

% echo a/b/ca/b/ccc |
  awk -F'/b/c++$' '{
    print "NF:", NF
    print "$1:", $1
    print "$2:", $2
    }'                                 
NF: 2
$1: a/b/ca
$2:

This may be implementation specific (I've used GNU awk 4.0 and nawk version 20070501).

radoulov · October 12, 2011, 7:30am

I found this in SUS:

9.4.6 EREs Matching Multiple Characters
[...]
The behavior of multiple adjacent duplication symbols ( '+', '*', '?', and intervals) produces undefined results.

vibhor_agarwali · October 12, 2011, 8:20am

Thanks a ton.
I thought myself to be good with regular expressions, probably time to change it

Let's assume below stands true:

echo a/b/c+ | awk -F'/b/c+' '{print $1, $2}'
a +

echo a/b/c | awk -F'/b/c+' '{print $1, $2}'
a

Request your thoughts on why the below doesn't match when we have a $:
echo a/b/c+ | awk -F'/b/c+$' '{print $1, $2}'
a/b/c+

c+ matches a/b/c.
$ makes it to think that it's the end of line, but it isn't as we have an additional + after that.
Hence, it doesn't qualify for a match & fails.

radoulov · October 12, 2011, 8:22am

Exactly (IMHO).

vibhor_agarwali · October 12, 2011, 8:35am

Bingo

This thread appears to be a good reference for regular expressions.
May I request moderators to give it a rating so that folks can refer to it.

radoulov · October 12, 2011, 8:36am

Sure, rated as Excellent.

vibhor_agarwali · October 12, 2011, 11:37am

Do the stars next to the thread represent the rating?
Stars were visible even before i gave the request.
Did they read our minds

Corona688 · October 12, 2011, 11:42am

Someone must have rated it, don't know who though!

vibhor_agarwali · October 12, 2011, 11:45am

Oh, thought that this functionality should be available only to moderators.
People can make fun this way.

radoulov · October 12, 2011, 11:52am

This functionality is public, I rated the thread as per OP requirement.