Regular Expression In Sed

Hi ,
I am learing sed

 echo  abc 123 def 456  |  sed 's|\([a-z]*\) \([0-9]*\)|\1|'

is returning

abc def 456

i was hoping

abc def 

"\1" should only print the occurence of the first pattern
but according to my understanding it is just removing the first occurence of the second pattern :confused:

Please can some onle explain what's really happing here and if i want to get only first patternt using "\( \)" how to achieve that

sed's s command only hits the first match unless you add a g flag:

$ echo  abc 123 def 456  |  sed 's|\([a-z]*\) \([0-9]*\)|\1|'
abc def 456
$ echo  abc 123 def 456  |  sed 's|\([a-z]*\) \([0-9]*\)|\1|g'
abcdef
$

Only partially true :
... Or until you specify another match number ...

# echo 'ABCDEFG'
ABCDEFG
# echo 'ABCDEFG' | sed 's/././'
.BCDEFG
# echo 'ABCDEFG' | sed 's/././1'
.BCDEFG
# echo 'ABCDEFG' | sed 's/././2'
A.CDEFG
# echo 'ABCDEFG' | sed 's/././3'
AB.DEFG
# echo 'ABCDEFG' | sed 's/././5'
ABCD.FG
# echo 'ABCDEFG' | sed 's/././g'
.......
#

---------- Post updated at 09:48 PM ---------- Previous update was at 09:36 PM ----------

Regarding your case, i would suggest you target the unwanted pattern and delete it (substitute it with empty string)

# echo  abc 123 def 456  | sed 's/ [0-9][0-9]*//g'
abc def

This is really incredible :), I thought it would print 'abc 123 def' omitting '456' for the \1, because of the greedy matching nature of sed's regular expression. This is really a new thing I learnt today..

Thank you Perderabo and ctsgnb now the things are clear to me

# echo  abc 123 def 456  |  sed 's|\([a-z]*\) \([0-9]*\)|\1|'
abc def 456
# echo  abc 123 def 456  |  sed 's|\([a-z]*\) \([0-9]*\)|\1|2'
abc 123 def
# echo  abc 123 def 456  |  sed 's|\([a-z]*\) \([0-9]*\)|\1|g'
abc def

But, I am getting different ouput on GNU sed version 4.1.5

echo abc 123 def 456 | sed 's|\([a-z]*\) \([0-9]*\)|\1|2' 

# Ans: abc 123def 456

The behaviour of GNU sed is the more "strictly" correct :

first match : abc 123

second match : \([a-z]*\) \([0-9]*\) means :
one string made of 0 or more characher within range a to z
one space
one string made of 0 or more digit within range 0 to 9

"<empty string><space><empty string>" = " "

So it match "the string made of 0 caracter followed by space followed by another 0 length digit string. " Which in fact means "space"
So the first space that comes just after the first pattern matching... match the second pattern matching ... that is why you get this result.

So It looks like the implementation of Sun sed and GNU sed differs ( this case demonstrate this)

So to get the same result, you should (just for the example) to force one of the string to match at least one caracter so it can not match empty string:

# echo  abc 123 def 456  |  sed 's|\([a-z]*\) \([0-9][0-9]*\)|\1|2'
abc 123 def
# echo  abc 123 def 456  |  sed 's|\([a-z][a-z]*\) \([0-9]*\)|\1|2'
abc 123 def