sed multiline problem

I'm trying to replicate the sed output on p.108 of Sed&Awk,by Doughery & Robbins, 2nd edition.

I'm on a Windows 10 Surface Pro, running Cygwin for 64-bit versions of Windows.

Input text saved in text file called data_p108.txt:

Consult Section 3.1 in the Owner and Operator
Guide for a description of the tape drives
available on your system.

sed script saved in file called multiline_scr:

/Operator$/{
N
s/Owner and Operator\nGuide /Installation Guide\
/
}

command I ran:

sed -f multiline_scr data_p108.txt

output:

Consult Section 3.1 in the Owner and Operator
Guide for a description of the tape drives
available on your system.

I was expecting the script to produce the output on p.108:

Consult Section 3.1 in the Installation Guide
for a description of the tape drives
available on your system.

I'm using the Atom editor which shows a carriage return and line feed
characters at the end of each line in the script and input data.

Any ideas why the script seems to just echo back the input?

I can get sed to work on single-line input but not on multi-line input as in
the above.

Without digging deeper: try \r\n (in lieu of \n alone) in the s ubstitute command.

interesting suggestion, to use \r\n (carriage return & new line) instead of just \n in the command

/Operator$/{
N
s/Owner and Operator\r\nGuide /Installation Guide\
/
}

unfortunately, I got the same results where sed just echoes back the input:

ZR6O@W-018974644353 /cygdrive/c/users/zr6o/Documents/sed_and_awk
$ sed -f multiline_scr data_p108.txt
Consult Section 3.1 in the Owner and Operator
Guide for a description of the tape drives
available on your system.

The reason is that Unix- and DOS-systems (and their descendants) have different line-ending sequences: UNIX always had a single-character (newline) as line-separator whereas DOS had no real print-processor program. Therefore its inventors made "carriage-return"+"line-feed" the line separator, so that the printer (we are talking typewriter-like printers here) could use that sequence directly.

Maybe you have some (unwanted) whitespace at the end of the line which makes your regexp not matching the text? You could analyse your input data with the od command: use od -ax /your/file to get a hex dump.

I hope this helps.

bakunin

Thank you for the suggestion to check spaces (hex 20) at the end of the lines. The command

od -ax data_p108.txt

yields

0000000   C   o   n   s   u   l   t  sp   S   e   c   t   i   o   n  sp
           6f43    736e    6c75    2074    6553    7463    6f69    206e
0000020   3   .   1  sp   i   n  sp   t   h   e  sp   O   w   n   e   r
           2e33    2031    6e69    7420    6568    4f20    6e77    7265
0000040  sp   a   n   d  sp   O   p   e   r   a   t   o   r  cr  nl   G
           6120    646e    4f20    6570    6172    6f74    0d72    470a
0000060   u   i   d   e  sp   f   o   r  sp   a  sp   d   e   s   c   r
           6975    6564    6620    726f    6120    6420    7365    7263
0000100   i   p   t   i   o   n  sp   o   f  sp   t   h   e  sp   t   a
           7069    6974    6e6f    6f20    2066    6874    2065    6174
0000120   p   e  sp   d   r   i   v   e   s  cr  nl   a   v   a   i   l
           6570    6420    6972    6576    0d73    610a    6176    6c69
0000140   a   b   l   e  sp   o   n  sp   y   o   u   r  sp   s   y   s
           6261    656c    6f20    206e    6f79    7275    7320    7379
0000160   t   e   m   .  cr  nl
           6574    2e6d    0a0d
0000166

I don't see any spaces at the end of any of the three lines.

I also tried the single line sed command with the multi-line flag m:

sed 'N; s/Owner and Operator\nGuide/Installation Guide/m' data_p108.txt

but it still just echoed back the input file:

Consult Section 3.1 in the Owner and Operator
Guide for a description of the tape drives
available on your system.

As you can see, you get
cr lf
so
/Operator$/
never match

As mentioned above, your MS Windows style text file is full of carriage returns:

0000040  sp   a   n   d  sp   O   p   e   r   a   t   o   r  cr  nl   G
           6120    646e    4f20    6570    6172    6f74    0d72    470a

Which are not spaces. You may be able to match them with \r or \x0d

1 Like

This updated script should match your file:

/Operator\r$/{
N
s/Owner and Operator\r\nGuide /Installation Guide\
/
}

You need the \r in the address regex as well. Try: /Operator\r$/

Thanks, all, just wanted to close the loop on this as Chubler_XL's updated script now works:

/Operator\r/{
N
s/Owner and Operator\r\nGuide /Installation Guide\
/
}

the hex version of carriage return also works:

/Operator\x0d/{
N
s/Owner and Operator\x0d\nGuide /Installation Guide\
/
}

producing the output text

Consult Section 3.1 in the Installation Guide
for a description of the tape drives
available on your system.

Thank you all!