Hello, im writing a script that validates a URL (the parameter) using http://validator.w3.org
first it downloads the site (the output line I want is stored in the h2 field of the site's html.
wget http://validator.w3.org/check?url=$1 2> /dev/null
sed -n '/<h2/p' check?uri=$1 | sed 's/ *<[^>]*>//g'
rm check\?uri\=$1
but i fear that, if the h2 field would be longer than one line, the sed command wouldn't find it.
is there any option with sed (or something else) so that it doesnt like line by line?
thx in advance!
This is another version i tried, but i also fear it wouldn't work if the h2 field is longer than 1 line
wget fttp://validator.w3.org/check?url=$1 2> /dev/null
sed -n '1,$s/.*<h2[^>]*>\(.*\)<\/h2>/\1/p' check?uri=$1
rm check\?uri\=$1
input: ./validate_html www.google.com
would give:
Errors found while checking this document as HTML5!
---------- Post updated at 11:08 AM ---------- Previous update was at 10:32 AM ----------
For example, this command works if the <h2 and </h2> aren't on the same line, but when they are on the same line, it outputs everything starting from the line untill the end ..
sed -n '/<h2/,/<\/h2>/p' check?uri=$1 | sed 's/ *<[^>]*>//g'