Strange sed behaviour

[/tmp]$ echo a.bc | sed -e "s/\|/\\|/g"
|a|.|b|c|
[/tmp]$ 

Is the behavior of the sed statement expected ? Or is this a bug in sed ?

OS details

Linux 2.6.9-55.0.0.0.2.ELsmp #1 SMP Wed May 2 14:59:56 PDT 2007 i686 i686 i386 GNU/Linux

The problem is perhaps in the quotation you use: The shell is probably "eating" your escape chars away and sed doesn't see what you expect it to see.

I made it a habit to use always single quotes for sed-statements to avoid this. It is even possible to use single quotes when using a variable inside an sed-statement:

sed 's/'"$src"'/'"$tgt"'/g'

will change ocurrences of $src to the value of $tgt

I hope this helps.

bakunin

Fine. The single quotes vs double quotes has an impact.

[/tmp]$ echo a.bc | sed -e "s/\|/\\|/g"
|a|.|b|c|
[/tmp]$ echo a.bc | sed -e 's/\|/\\|/g'
\|a\|.\|b\|c\|
[/tmp]$ 

But that does not explain those extra characters in the output. In either case I would expect the output to be a.bc and not anything which has | or \| wrapped around every character.

Am I missing something something here ?

To be honest, now i'm astonished myself:

lacking a UNIX machine (got a day off) i fired up cygwin and tried:

# echo a.bc | sed --posix 's/\|/x/g'
xax.xbxcx

# echo 'a.bc' | sed --posix 's/\|/x/g'
xax.xbxcx

# echo 'a.bc' | sed --posix 's/|/x/g'
a.bc

# sed --version
GNU sed version 4.1.5
Copyright (C) 2003 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE, to the extent permitted by law.

It seems that "\|" is matching the NULL-regexp, whereas "|" is matching a single pipe character as expected. I have no idea, why this is the case, but will investigate further.

bakunin

my guess:
\| is used in sed (gnu) as alternation. therefore

# echo "a.bc" | sed -e 's/\|/\\|/g'
\|a\|.\|b\|c\|

seem to say "blank" or "blank" (or null?) substitute with \|, hence the result.
if really want to search for a "|", use the open square brackets

# echo "a.bc" | sed -e 's/[|]/\\|/g'
a.bc
# echo "a|bc" | sed -e 's/[|]/\\|/g'
a\|bc

Wouldn't the alternation operator require atleast two operands ?

Humm, seems to be a GNU sed-ism

Under Interix 6.0 and ksh-93 and the OOB sed (non-GNU), it works as expected

$ echo a.bc | sed -e "s/\|/\\|/g"
a.bc
$

> echo a.bc | sed -e "s/\|/\\|/g"
a.bc

> uname -a
SunOS grape 5.10 Generic_125100-04 sun4u sparc SUNW,Sun-Fire-V440 Solaris
> env | grep SHELL
SHELL=/bin/bash

I tried the same under AIX 5.3:

# what /usr/bin/sed
/usr/bin/sed:
        61      1.14  src/bos/usr/ccs/lib/libc/__threads_init.c, libcthrd, bos530 7/11/00 12:04:14
        24      1.38  src/bos/usr/bin/sed/sed0.c, cmdedit, bos530 8/27/03 04:21:19
        35      1.14.1.21  src/bos/usr/bin/sed/sed1.c, cmdedit, bos53D, d2005_18F0 5/5/05 03:34:10

# instfix -i | grep AIX_ML
    All filesets for 5.3.0.0_AIX_ML were found.
    Not all filesets for 5300-01_AIX_ML were found.
    All filesets for 5300-02_AIX_ML were found.
    All filesets for 5300-03_AIX_ML were found.
    All filesets for 5300-04_AIX_ML were found.

# echo a.bc | sed -e 's/\|/\\|/g'
a.bc

# echo a.bc | sed -e "s/\|/\\|/g"
a.bc

Seems like you found a GNUism of GNU-sed. I find it rather interesting, that GNU-sed does it even with the "--posix"-flag. Isn't the flag supposed to turn all extensions off?

bakunin