sed replace command

Hi.

I need to append/prefix an & character to every 'single' & character (not when there are 2 or more grouped together) I find in a file. I can do it using this cmd:

cat ${file} | sed -e 's/&/&&/g' > ${new_file}

How can I modify this to ensure I only replace single &'s and not operate on multiple groupings like && or &&& etc.?

Many thanks.

Try

sed 's/\([^&]\)\(&\)\([^&]\)/\1\2\2\3/g' file

Thanks Rudi. It only seems to partially work for me. looks like it only updates single &s if they're on a line with other &s.
Here's the output:

[oracle@xtest54 bin]$ cat test1
&

&

&&

&&&

& & &
&& && &&&&&&
[oracle@xtest-54 bin]$ sed 's/\([^&]\)\(&\)\([^&]\)/\1\2\2\3/g' test1
&

&

&&

&&&

& && &
&& && &&&&&&
[oracle@xtest-54 bin]$

Had you posted a meaningful input sample, a better solution could have been offered. Try

sed ':L; s/\(^\|[^&]\)\(&\)\([^&]\|$\)/\1\2\2\3/g; tL' file
&&

&&

&&

&&&

&& && &&
&& && &&&&&&
1 Like

Perfect. Yes, I should've added an example to be clear. Sorry for that and thanks for the solution.

I find sed scripts with a lot of escapes much more readable if other separator is used.

Same code Rudi posted is much more readable and does the same :

sed ':L; s#\(^\|[^&]\)\(&\)\([^&]\|$\)#\1\2\2\3#g; tL'

In this case it's hash, but it can be anything but the patterns required for transformation.

2 Likes

With perl's lookbehind and lookahead:

perl -pe 's/(?<!&)&(?!&)/&&/g' file
1 Like

Note: The use of \| for alternation is a GNU extension

Since one would need to use GNU sed anyway, one might as well use its -r option for extended regular expressions:

sed -r ':L; s/(^|[^&])(&)([^&]|$)/\1\2\2\3/g; tL' file

or

sed -r ':L; s/(^|[^&])&([^&]|$)/\1\&\&\2/g; tL' file

--
In regular sed:

sed -e 's/^&$/\&\&/; s/^&\([^&]\)/\&\&\1/; s/\([^&]\)&$/\1\&\&/' -e :L -e 's/\([^&]\)&\([^&]\)/\1\&\&\2/g; t L' file
1 Like

Thanks as well. Could you please help me understand how the sed command works in this case?

The regular sed command Scrutinizer suggested:

sed -e 's/^&\([^&]\)/\&\&\1/; s/\([^&]\)&$/\1\&\&/' -e :L -e 's/\([^&]\)&\([^&]\)/\1\&\&\2/g; t L' file

could also be written as:

sed -e 's/^&\([^&]\)/\&\&\1/
s/\([^&]\)&$/\1\&\&/
:L
s/\([^&]\)&\([^&]\)/\1\&\&\2/g
t L' file

The 1st substitute command ( s/^&\([^&]\)/\&\&\1/ ) looks at the start of an input line ( ^ ) for a literal ampersand character ( & ) followed by a single character that is not an ampersand ( [^&] ) remembering the character that matched (because it is between a pair of escaped parenthese ( \( ... \) ) and, if there was a match, replaces it with two literal ampersands ( \&\& ) and the string that was matched by the 1st expression found between escaped parenthese ( \1 ).

The 2nd substitute command ( s/\([^&]\)&$/\1\&\&/ ) performs equivalent logic looking for a match at the end of the line ( $ ) instead of at the beginning of the line.

The :L creates a label ( L ) in the script that can be branched to later.

The 3rd substitute command ( s/\([^&]\)&\([^&]\)/\1\&\&\2/g ) looks for a character that is not an ampersand followed by a literal ampersand followed by a character that is not an ampersand and replaces them with the 1st non-ampersand character, two literal ampersand characters, and the 2nd non0-ampersand character. The g flag at the end of the command tells sed to make this substitution as many times as it can on the line (without the g flag, it would only make the 1st possible substitution). Note that with input like a&b&c&d&e , this substitution will double the ampersand characters after the a and c , but not after the b and d . This is because the b and d were matched as the 2nd non-ampersand character after the a and after the c and can't also be matched as the 1st non-ampersand character in the b&c and d&e substrings in the input.

The transfer command ( t L ) tells sed to transfer to the line in the script with the label L if and only if a substitute command successfully matched and substituted text since the last t command was executed. This lets it run the 3rd substitute command again if it made one or more substitutions the 1st time it was processed.

It wasn't clear to me from your discussion whether or not a line that contains just a single ampersand character and nothing else is supposed to double that ampersand. The script above does not double an ampersand in this case. If you do want to double an ampersand that is the only character on an input line, you could add another substitute command to take care of that case:

s/^&$/&&/

Note that the ampersands in the replacement pattern do not have to be escaped in this case. An unescaped & in the replacement string in a substitute command is replaced by the entire string matched by the basic regular expression pattern in the substitute command. Since the string matched in this case is just an & the replacement strings && and \&\& produce identical results.

2 Likes

Thanks, Don for spotting the bug. Corrected in my post..

Making use of sed's unescaped & , the Unix-(=all-)sed solution becomes

sed -e 's/^&$/&&/; s/^&[^&]/\&&/; s/[^&]&$/&\&/' -e :L -e 's/\([^&]\)&\([^&]\)/\1\&\&\2/g; t L' file

Another solution

sed -e 's/^/ /; s/$/ /' -e :L -e  's/\([^&]\)&\([^&]\)/\1\&\&\2/g; t L' -e 's/^ //; s/ $//' file