SED - Unable to replace with <tab>

PikK45 · July 18, 2012, 9:07am

Hello All,

I have this file with the below contents

 
1|2|3|4|
this|that|which|what|

when I use,

 sed 's/|/\t/g' infile

I get,

 
1t2t3t4t
thistthattwhichtwhatt

Why is this?? :wall:

itkamaraj · July 18, 2012, 9:17am

what is your OS ?

$ echo "A|B|C" | sed 's/\|/\t/g'

try with " ( double quotes ) instead of single quote

elixir_sinari · July 18, 2012, 9:21am

Use the Tab key instead of the escape sequence.

From man page for sed(POSIX):

[2addr]s/BRE/replacement/flags
                .
                .
                .
       The replacement string shall be	scanned  from  beginning  to  end.  An
       ampersand ( '&' ) appearing in the replacement shall be replaced by the
       string matching the BRE. The special meaning of '&' in this context can
       be  suppressed  by  preceding  it  by a backslash. The characters "\n",
       where n is a digit, shall be replaced by the text matched by the corre-
       sponding  backreference expression. The special meaning of "\n" where n
       is a digit in this context, can be suppressed  by  preceding  it  by  a
       backslash.  For each other backslash ( '\' ) encountered, the following
       character shall lose its special meaning (if any). The meaning of a '\'
       immediately  followed  by any character other than '&' , '\' , a digit,
       or the delimiter character used for this command, is unspecified.

       A line can be split by substituting a <newline> into it.  The  applica-
       tion shall escape the <newline> in the replacement by preceding it by a
       backslash. A substitution shall be considered to  have  been  performed
       even  if  the  replacement  string  is  identical to the string that it
       replaces. Any backslash used to alter the default meaning of  a	subse-
       quent  character  shall	be  discarded  from the BRE or the replacement
       before evaluating the BRE or using the replacement.

244an · July 18, 2012, 4:55pm

Like @elixir_sinari says, you must enter a "real" tab in the editor when typing the code. It works for both search and replace string.
I was struggling with this some time ago, and when I googled I find out that some had problem with entering a tab in the editor. If you have that problem, or like me think that it's kind of difficult to read the code when there is a <tab> in the text (the length in the editor is "randomly" - depending on the position in the text), here is a way of fixing that. Perhaps very ugly, but I prefer that:

REAL_TAB=$(echo -e "\t")
# then use it in sed
echo "A|B|C" | sed "s/\|/$REAL_TAB/g"
=> "A       B       C"

You must use double quotes to expand REAL_TAB. And "-e"-flag to make echo expanding "\t". I think there can be differences for the -e flag in different environments/shells?

Corona688 · July 18, 2012, 5:03pm

It may also be convenient to use awk for this. It has two special variables, FS(input separator) and OFS(output separator). Set them appropriately and do any modification to the input string( even something pointless like $1=$1), and it converts them all as appropriate.

awk '{$1=$1} 1' FS="|" OFS="\t" inputfile  > outputfile

You could even use tr to do this.

tr '|' '\t' < inputfile > outputfile

Both tr and awk should always accept \t. Old crusty versions of sed sometimes don't.

244an · July 18, 2012, 5:33pm

Funny, I was just posting this http://www.unix.com/shell-programming-scripting/194163-awk-must-touch-n-variable-get-ofs-used.html\#post302673979 about having to use $1=$1 , it was when I first was thinking of answer "use awk with FS and OFS with "|" and "\t" in this thread, like your post, but when I first tested it I found out this problem.

So this is a normal behavior for awk, that you have to touch one field to enable OFS for the whole line?

Corona688 · July 20, 2012, 11:13am

Yes, very normal. It's not that OFS is "disabled" -- it's that awk's still got the original line in memory, which it will print raw, not having been told to do otherwise. To recalculate the line, you must do so explicitly by doing an operation on one of the tokens.

alister · July 20, 2012, 11:36am

As far as I know, only GNU sed interprets that undefined sequence as a tab character. If that is incorrect, I'll update this recent post which discusses this in more detail: awk - Must "touch" a $n-variable to get OFS used? Post: 302674253

Regards,
Alister

Corona688 · July 23, 2012, 10:46am

I've found more sed implementations that accept \t than ones that don't.

Ones that don't, often don't accept \n either.

Scrutinizer · July 23, 2012, 11:12am

AFAIK Posix sed does not accept '\t' as a representation for the TAB character. I have only encountered support for \t with GNU sed (and derivatives)

alister · July 23, 2012, 1:33pm

Your findings are highly atypical (with respect to the usual workstation/server unix experience).

None of the native sed implementations on the following systems support \t (yet all support \n): Solaris, HP-UX, AIX, (Free|Net|Open)BSD, OS X.

Before reading your post, I had never even heard of a sed implementation whose regular expression grammar did not include \n as a sequence for a newline. For whatever it's worth, the \n sequence has been part of sed since the very beginning (1979); it is mentioned in the UNIX Seventh Edition manual (vol 2b), http://www.cs.bell-labs.com/7thEdMan/ (\t is not).

Regards,
Alister

Scrutinizer · July 23, 2012, 1:41pm

The idea that \n would not be supported by some sed implementations probably stems from the fact that with the s-command \n is only supported in the regex part, not in the replacement part (again with the exception of GNU sed)..

--
The y-command in sed supports \n on both sides...

gary_w · July 23, 2012, 1:45pm

For the sake of argument, another way:

$ sed 's/|/<Ctrl-V><TAB>/g' infile

Ctrl-V allows for the entry of a control character. Follow it by pressing the tab key.

To prove it worked, redirect the output to a file, open it in vi, type the :set list command and you will see the tabs represented with "^I".

Scrutinizer · July 23, 2012, 2:05pm

Another way to accomplish entering an actual TAB-character ( in bash / ksh93 ):

sed s/\|/$'\t'/g file