Regular Expression In Sed

max_hammer · May 12, 2011, 3:24pm

Hi ,
I am learing sed

 echo  abc 123 def 456  |  sed 's|\([a-z]*\) \([0-9]*\)|\1|'

is returning

abc def 456

i was hoping

abc def

"\1" should only print the occurence of the first pattern
but according to my understanding it is just removing the first occurence of the second pattern

Please can some onle explain what's really happing here and if i want to get only first patternt using "\( \)" how to achieve that

Perderabo · May 12, 2011, 3:30pm

sed's s command only hits the first match unless you add a g flag:

$ echo  abc 123 def 456  |  sed 's|\([a-z]*\) \([0-9]*\)|\1|'
abc def 456
$ echo  abc 123 def 456  |  sed 's|\([a-z]*\) \([0-9]*\)|\1|g'
abcdef
$

ctsgnb · May 12, 2011, 3:48pm

Only partially true :
... Or until you specify another match number ...

# echo 'ABCDEFG'
ABCDEFG
# echo 'ABCDEFG' | sed 's/././'
.BCDEFG
# echo 'ABCDEFG' | sed 's/././1'
.BCDEFG
# echo 'ABCDEFG' | sed 's/././2'
A.CDEFG
# echo 'ABCDEFG' | sed 's/././3'
AB.DEFG
# echo 'ABCDEFG' | sed 's/././5'
ABCD.FG
# echo 'ABCDEFG' | sed 's/././g'
.......
#

---------- Post updated at 09:48 PM ---------- Previous update was at 09:36 PM ----------

Regarding your case, i would suggest you target the unwanted pattern and delete it (substitute it with empty string)

# echo  abc 123 def 456  | sed 's/ [0-9][0-9]*//g'
abc def

royalibrahim · May 13, 2011, 12:52am

This is really incredible :), I thought it would print 'abc 123 def' omitting '456' for the \1, because of the greedy matching nature of sed's regular expression. This is really a new thing I learnt today..

max_hammer · May 13, 2011, 5:23am

Thank you Perderabo and ctsgnb now the things are clear to me

ctsgnb · May 13, 2011, 5:31am

# echo  abc 123 def 456  |  sed 's|\([a-z]*\) \([0-9]*\)|\1|'
abc def 456
# echo  abc 123 def 456  |  sed 's|\([a-z]*\) \([0-9]*\)|\1|2'
abc 123 def
# echo  abc 123 def 456  |  sed 's|\([a-z]*\) \([0-9]*\)|\1|g'
abc def

royalibrahim · May 13, 2011, 10:32am

But, I am getting different ouput on GNU sed version 4.1.5

echo abc 123 def 456 | sed 's|\([a-z]*\) \([0-9]*\)|\1|2'

# Ans: abc 123def 456

ctsgnb · May 13, 2011, 3:38pm

The behaviour of GNU sed is the more "strictly" correct :

first match : abc 123

second match : \([a-z]*\) \([0-9]*\) means :
one string made of 0 or more characher within range a to z
one space
one string made of 0 or more digit within range 0 to 9

"<empty string><space><empty string>" = " "

So it match "the string made of 0 caracter followed by space followed by another 0 length digit string. " Which in fact means "space"
So the first space that comes just after the first pattern matching... match the second pattern matching ... that is why you get this result.

So It looks like the implementation of Sun sed and GNU sed differs ( this case demonstrate this)

So to get the same result, you should (just for the example) to force one of the string to match at least one caracter so it can not match empty string:

# echo  abc 123 def 456  |  sed 's|\([a-z]*\) \([0-9][0-9]*\)|\1|2'
abc 123 def
# echo  abc 123 def 456  |  sed 's|\([a-z][a-z]*\) \([0-9]*\)|\1|2'
abc 123 def