echo abc 123 def 456 | sed 's|\([a-z]*\) \([0-9]*\)|\1|'
is returning
abc def 456
i was hoping
abc def
"\1" should only print the occurence of the first pattern
but according to my understanding it is just removing the first occurence of the second pattern
Please can some onle explain what's really happing here and if i want to get only first patternt using "\( \)" how to achieve that
This is really incredible :), I thought it would print 'abc 123 def' omitting '456' for the \1, because of the greedy matching nature of sed's regular expression. This is really a new thing I learnt today..
The behaviour of GNU sed is the more "strictly" correct :
first match : abc 123
second match : \([a-z]*\) \([0-9]*\) means :
one string made of 0 or more characher within range a to z
one space
one string made of 0 or more digit within range 0 to 9
"<empty string><space><empty string>" = " "
So it match "the string made of 0 caracter followed by space followed by another 0 length digit string. " Which in fact means "space"
So the first space that comes just after the first pattern matching... match the second pattern matching ... that is why you get this result.
So It looks like the implementation of Sun sed and GNU sed differs ( this case demonstrate this)
So to get the same result, you should (just for the example) to force one of the string to match at least one caracter so it can not match empty string: