Awk expressions working & not working

Hi,

Putting across a few awk expressions.
Apart from the last, all of them are working.

echo a/b/c | awk -F'/b/c$' '{print $1}'
a

echo a/b/c++ | awk -F'/b/c++' '{print $1}'
a

echo a/b/c++ | awk -F'/b/c++$' '{print $1}'
a/b/c++

Request thoughts on why putting a '$' post double ++ changed everything.
I require a $ as interested in only the last string.

Do let me know workarounds to accomplish the same.
Thanks in advance

In certain positions the + sign is special (one or more occurrences of the previous character):

% echo a/b/c++ | awk -F'/b/c\\++$' '{print $1}'
a

Yes thought of the same but couldn't fit that logic here.

If + is working based on its logic of more occurrences, it shouldn't have worked without the $.
Wondering what effect has putting the $ has?

/b/c++ matches / - b - / - c (one or more occurrences) - + (this one seems to be ignored)

So:

% echo a/b/c+ | awk -F'/b/c++' '{print $1, $2}'
a +
% echo a/b/cccc+ | awk -F'/b/c++' '{print $1, $2}'
a +

/b/c++$ matches / - b - / - c (one or more occurrences) - + (ignored) at the end of the string ($):

These two should be clear:

% echo a/b/c++ |
  awk -F'/b/c++$' '{
    print "NF:", NF
    print "$1:", $1
    print "$2:", $2
    }'
NF: 1
$1: a/b/c++
$2: 
% echo a/b/c++ |
  awk -F'/b/c++' '{ 
    print "NF:", NF
    print "$1:", $1
    print "$2:", $2
    }'         
NF: 2
$1: a
$2: ++

Here the pattern (FS) is not found:

% echo a/b/c++ | 
  awk -F'/b/c++$' '{
    print "NF:", NF
    print "$1:", $1
    print "$2:", $2
    }'                            
NF: 1
$1: a/b/c++
$2: 

I suppose that the behaviour is unspecified for more than one consecutive quantifiers (+ signs), it siply ignores the second + sign:

% echo a/b/c |  
  awk -F'/b/c++$' '{
    print "NF:", NF
    print "$1:", $1
    print "$2:", $2
    }'                          
NF: 2
$1: a
$2: 

Here only the last a/b/c[c..] matches, because it's at the end of the record:

% echo a/b/ca/b/ccc |
  awk -F'/b/c++$' '{
    print "NF:", NF
    print "$1:", $1
    print "$2:", $2
    }'                                 
NF: 2
$1: a/b/ca
$2: 

This may be implementation specific (I've used GNU awk 4.0 and nawk version 20070501).

I found this in SUS:

9.4.6 EREs Matching Multiple Characters
[...]
The behavior of multiple adjacent duplication symbols ( '+', '*', '?', and intervals) produces undefined results.

Thanks a ton.
I thought myself to be good with regular expressions, probably time to change it :slight_smile:

Let's assume below stands true:

echo a/b/c+ | awk -F'/b/c+' '{print $1, $2}'
a +

echo a/b/c | awk -F'/b/c+' '{print $1, $2}'
a

Request your thoughts on why the below doesn't match when we have a $:
echo a/b/c+ | awk -F'/b/c+$' '{print $1, $2}'
a/b/c+

c+ matches a/b/c.
$ makes it to think that it's the end of line, but it isn't as we have an additional + after that.
Hence, it doesn't qualify for a match & fails.

Exactly (IMHO).

Bingo

This thread appears to be a good reference for regular expressions.
May I request moderators to give it a rating so that folks can refer to it.

Sure, rated as Excellent.

Do the stars next to the thread represent the rating?
Stars were visible even before i gave the request.
Did they read our minds :slight_smile:

Someone must have rated it, don't know who though!

Oh, thought that this functionality should be available only to moderators.
People can make fun this way.

This functionality is public, I rated the thread as per OP requirement.