sed multiple line trouble

drareeg · November 2, 2010, 10:00am

Hello, im writing a script that validates a URL (the parameter) using http://validator.w3.org
first it downloads the site (the output line I want is stored in the h2 field of the site's html.

 
wget http://validator.w3.org/check?url=$1 2> /dev/null
sed -n '/<h2/p' check?uri=$1 | sed 's/ *<[^>]*>//g'
rm check\?uri\=$1

but i fear that, if the h2 field would be longer than one line, the sed command wouldn't find it.
is there any option with sed (or something else) so that it doesnt like line by line?
thx in advance!

This is another version i tried, but i also fear it wouldn't work if the h2 field is longer than 1 line

wget fttp://validator.w3.org/check?url=$1 2> /dev/null
sed -n '1,$s/.*<h2[^>]*>\(.*\)<\/h2>/\1/p'  check?uri=$1
rm check\?uri\=$1

ctsgnb · November 2, 2010, 10:06am

Could you please post a sample of input file and expected output ?

drareeg · November 2, 2010, 12:08pm

input: ./validate_html www.google.com
would give:
Errors found while checking this document as HTML5!

---------- Post updated at 11:08 AM ---------- Previous update was at 10:32 AM ----------

For example, this command works if the <h2 and </h2> aren't on the same line, but when they are on the same line, it outputs everything starting from the line untill the end ..
sed -n '/<h2/,/<\/h2>/p' check?uri=$1 | sed 's/ *<[^>]*>//g'

ctsgnb · November 2, 2010, 3:42pm

After a quick look on Unixdaemon: Internet Explorer Plugins it seems that this plugin should be invoked through a gui interface (mouse right click)

drareeg · November 2, 2010, 5:26pm

huh??

binlib · November 3, 2010, 11:53am

wget -O - http://validator.w3.org/check?url=$1 2> /dev/null |
 sed '/<h2/!d
:y
/<\/h2>/{
s/ *<[^>]*>//g
s/\n/ /g
q
}
N
by
'