sed match

Hi can anyone help with the following:

echo "Date range on 5th May is between -010516 and 050516- please continue "| sed 's/\(.*-\)\(.*-\)\(.*$\)/\2/'

output

010516 and 050516-

What i need is to include the - to be included.

Desired output:

-010516 and 050516-

I know .*pattern will match from after the first "-" but how can i also include the "-".

Thanks

What about:

echo "Date range on 5th May is between -010516 and 050516- please continue " | sed 's/.*\(-.*-\).*/\1/'
1 Like

You mean like so?

sed 's/.*\(-.*-\).*/\1/'

Better would be:

sed 's/.*\(-[^-]*-\).*/\1/'

--
With GNU grep:

grep -o -- '-[^-]*-'

--edit--
First answer above already provided by chapeupreto

1 Like

Hi , thanks for your replies.

Both commands work, and i understand fully the code

sed 's/.*\(-.*-\).*/\1/' 

However i dont really understand how this works:

sed 's/.*\(-[^-]*-\).*/\1/'

To me it seems like its trying to match - then anything beginning with - followed by - . I know im wrong, are you able to explain how it works?

Also , what is the best way if i wanted to not have the - ie with an output of just

010516 and 050516

Many thanks

Please ignore this post. I obviously needed to get some sleep before I posted it.

OK. I know that you already understand it, but just to be clear, the \1 in the replacement string expands to the text that was matched between the 1st \( and the matching \) in the substitute regular expression. And, the -.*- between the parens in that RE will match the 1st - on the line ( - in the RE), everything after the 1st - unto but not including the last - on the line ( .* in the RE), and the last - on the line ( - in the RE).

The RE between parentheses in this sed substitute command ( -[^-]*- ) matches the 1st - on the line ( - in the RE), the longest string of characters available that does not include a - ( [^-]* in the RE), and the 2nd - on the line ( - in the RE).

So, these sed commands do the same thing for input lines that contain two or fewer - characters. But, for lines that have three or more - characters, the 1st sed prints the 1st and last - and everything between them while the 2nd sed command prints the 1st and 2nd - and everything between them. And, to stop printing the matched - characters, move them outside the parentheses in the RE:

sed 's/.*-\(.*\)-.*/\1/'
sed 's/.*-\([^-]*\)-.*/\1/'

Please ignore this post. I obviously needed to get some sleep before I posted it.

1 Like

Hello andy391791,

Following may help you in same too.

echo "Date range on 5th May is between -010516 and 050516- please continue " | sed 's/\([^-].*-\)\([^-].*\)\(-.*\)/-\2-/'
 

Output will be as follows.

-010516 and 050516-

Thanks,
R. Singh

Hi Don, thankyou very much for your time in explaining that to me.

However, after doing a bit of testing im still slightly confused and would like to understand :

"The -.- between the parens in that RE will match the 1st - on the line ( - in the RE), everything after the 1st - unto but not including the
last - on the line ( .
in the RE), and the last - on the line ( - in the RE)."

echo "Date-range on 5th May is between -010516 and 050516- please continue "| sed 's/.*\(-.*-\).*/\1/'

output
-010516 and 050516-

Why is this not -range on 5th May is between -010516 and 050516- ?

echo "Date range on 5th May is between -010516 and 050516- please-continue "| sed 's/.*\(-.*-\).*/\1/'

output

  • please-

Why is this not -010516 and 050516- please- ?

For the second statement:

"The RE between parentheses in this sed substitute command ( -[^-]- ) matches the 1st - on the line ( - in the RE),
the longest string of characters available that does not include a - ( [^-]
in the RE), and the 2nd - on the line ( - in the RE)."

echo "Date-range on 5th May is between -010516 and 050516- please continue "| sed 's/.*\(-[^-]*-\).*/\1/'

output
-010516 and 050516-

Why is this not -range on 5th May is between - ?

echo "Date range on 5th May is between -010516 and 050516- please-continue "| sed 's/.*\(-[^-]*-\).*/\1/'

output

  • please-

Why is this not -010516 and 050516- ?

Thanks again

The .* is greedy, that means it consumes as many characters as possible - while the other conditions are still met.
If there are two .* then the leftmost is most greedy.

2 Likes

Hi, in the example below its printing less than expected, not more:

echo "Date-range on 5th May is between -010516 and 050516- please continue "| sed 's/.*\(-.*-\).*/\1/'

output
-010516 and 050516-

If it was being greedy wouldnt it print -range on 5th May is between -010516 and 050516- ?

Im obviously confused with this :confused:

Greedy means earlier parts of the regex "win", they will match as far as they can. They'll only ever give it up when the rest of the expression fails to match. So the first .* matches all the way to the end, stealing the entire expression if it can get away with it, and backtracking when it can't.

.*\(-.*-).*
Date-range on 5th May is between -010516 and 050516- please continue
Date-range on 5th May is between -010516 and 050516- please continue
Date-range on 5th May is between -010516 and 050516- please continue
Date-range on 5th May is between -010516 and 050516- please continue
Date-range on 5th May is between -010516 and 050516- please continue
Date-range on 5th May is between -010516 and 050516- please continue
Date-range on 5th May is between -010516 and 050516- please continue
Date-range on 5th May is between -010516 and 050516- please continue
Date-range on 5th May is between -010516 and 050516- please continue
Date-range on 5th May is between -010516 and 050516- please continue
Date-range on 5th May is between -010516 and 050516- please continue
Date-range on 5th May is between -010516 and 050516- please continue
Date-range on 5th May is between -010516 and 050516- please continue
Date-range on 5th May is between -010516 and 050516- please continue
Date-range on 5th May is between -010516 and 050516- please continue
Date-range on 5th May is between -010516 and 050516- please continue
Date-range on 5th May is between -010516 and 050516- please continue
Date-range on 5th May is between -010516 and 050516- please continue
Date-range on 5th May is between -010516 and 050516- please continue
Date-range on 5th May is between -010516 and 050516- please continue
Date-range on 5th May is between -010516 and 050516- please continue
Date-range on 5th May is between -010516 and 050516- please continue
Date-range on 5th May is between -010516 and 050516- please continue
Date-range on 5th May is between -010516 and 050516- please continue
Date-range on 5th May is between -010516 and 050516- please continue
Date-range on 5th May is between -010516 and 050516- please continue
Date-range on 5th May is between -010516 and 050516- please continue
Date-range on 5th May is between -010516 and 050516- please continue
Date-range on 5th May is between -010516 and 050516- please continue
Date-range on 5th May is between -010516 and 050516- please continue
Date-range on 5th May is between -010516 and 050516- please continue
Date-range on 5th May is between -010516 and 050516- please continue
Date-range on 5th May is between -010516 and 050516- please continue
Date-range on 5th May is between -010516 and 050516- please continue
Date-range on 5th May is between -010516 and 050516- please continue
Date-range on 5th May is between -010516 and 050516- please continue
Date-range on 5th May is between -010516 and 050516- please continue
Date-range on 5th May is between -010516 and 050516- please continue
Date-range on 5th May is between -010516 and 050516- please continue
Date-range on 5th May is between -010516 and 050516- please continue
Date-range on 5th May is between -010516 and 050516- please continue
Date-range on 5th May is between -010516 and 050516- please continue
Date-range on 5th May is between -010516 and 050516- please continue
Date-range on 5th May is between -010516 and 050516- please continue
Date-range on 5th May is between -010516 and 050516- please continue
Date-range on 5th May is between -010516 and 050516- please continue
Date-range on 5th May is between -010516 and 050516- please continue
Date-range on 5th May is between -010516 and 050516- please continue
Date-range on 5th May is between -010516 and 050516- please continue
Date-range on 5th May is between -010516 and 050516- please continue
Date-range on 5th May is between -010516 and 050516- please continue
Date-range on 5th May is between -010516 and 050516- please continue
2 Likes

I sincerely apologize. I haven't been getting enough sleep lately.

It looks like Corona688 and MadeInGermany have mostly cleaned up my mess. I sincerely thank both of them for correcting my earlier misinformation.

If you want to print everything between the 1st and last dashes on a line (including the dashes), you need something like:

echo "Date-range on 5th May is between -010516 and 050516- please continue "| sed 's/[^-]*\(-.*-\).*/\1/'

producing the output:

-range on 5th May is between -010516 and 050516-

If you want to print everything between the 1st and last dashes on a line (not printing the 1st and last dashes), you need something like:

echo "Date-range on 5th May is between -010516 and 050516- please continue "| sed 's/[^-]*-\(.*\)-.*/\1/'

producing the output:

range on 5th May is between -010516 and 050516

If you want to print everything between the 1st two dashes on a line (printing those dashes), you need something like:

echo "Date-range on 5th May is between -010516 and 050516- please continue "| sed 's/[^-]*\(-[^-]*-\).*/\1/'

producing the output:

-range on 5th May is between -

And, if you want to print everything between the 1st two dashes on a line (not printing those dashes), you need something like:

echo "Date-range on 5th May is between -010516 and 050516- please continue "| sed 's/[^-]*-\([^-]*\)-.*/\1/'

producing the output:

range on 5th May is between 

Note that in all of these substitution BREs, the expression before the parenthesized expression we will print starts with [^-]* which will greedily gobble up as many non-dash characters as it can find (but not any dashes).

1 Like

Don, absolutely no need to apologize ;many thanks to you and the other posts for taking the time explaining this to me, it now makes sense !