Running sed from a script query

Hello!

I'm trying to run this code to print the body of an html document (all text in between <body> and </body>) from a script but am unsure how to call it from the command line interface.

/<body>/,/<\/body>/
1s/.*<body>//
$s/<\/body>.*//p

I have tried to call it using this:

sed -n -f sedscript1.sed test.txt

text.txt being where the html text is stored.
I get the error message:

sed: file sedscript1.sed line 2: unknown command: `
'

when trying to run it though :confused:

What am I doing wrong!?

Thanks for your help!

1) Your first line is missing a command to do. 'for all lines between <body> and </body>" -- do something, but "something" is missing. /<body>/,/<\/body>/ p is more complete, p to print.

2) It probably won't work. It will match all lines between <body> and </body>, including parts of the line before and after these tags. If the HTML is one giant line it will print everything.

You can do similar things in awk, but you get to tell it what a 'line' is, which is useful for matching one tag per 'line'.

This will match tags more properly, and also split each tag onto a line:

awk -v RS="<" '/^[bB][oO][dD][yY]/,/^\/[bB][oO][dD][yY]/ { $1="<"$1 ; print }' file.html

Thanks for the help! But when I use it like this (from the command line):

sed -n '/<body>/,/<\/body>/p' test.txt | sed -e '1s/.*<body>//' -e '$s/<\/body>.*//' 

it will take the input:

<!DOCTYPE html><html lang="en">
<head><title>Images</title></head><body><ul>
<li><a href="IMG_1389.JPG">IMG_1389.JPG<\a> (1.7M)<\li>
<li><a href="IMG_1390.JPG">IMG_1390.JPG<\a> (1.5M)<\li>
<li><a href="IMG_1391.JPG">IMG_1391.JPG<\a> (1.4M)<\li>
</ul></body></html>

and output exactly what I need:

<ul>
<li><a href="IMG_1389.JPG">IMG_1389.JPG<\a> (1.7M)<\li>
<li><a href="IMG_1390.JPG">IMG_1390.JPG<\a> (1.5M)<\li>
<li><a href="IMG_1391.JPG">IMG_1391.JPG<\a> (1.4M)<\li>
</ul>

But I need to be able to call it from a script...

Well, you could paste that command into a script?

Trouble is, it doesn't accept the commas and I thought each expression had to be written on a new line?

No, I mean, the whole line you gave, into a script file. Otherwise you're going to need more than one file to feed all those sed | sed | sed.

awk '/^[uU][lL]/,/^\/[uU][lL]/ { $1="<"$1 ; print }; END { printf("\n"); }' RS="<" ORS="" FS="" OFS="" inputfile

Try this in your sed file:

s/.*<body>//
s/<\/body>.*//
/<ul>/,/<\/ul>/p

Beware that if your HTML changes slightly, it will break down.