sed newline

azertyazerty · November 7, 2010, 6:39am

Hi everyone,

I'd like to use the script validatehtml which returns either the given url is HTML strict or not, using http:// validator . w3 . org .

sh validatehtml

#!/bin/bash
wget -q http:// validator . w3 .org / check?uri=$1
cat check\?uri\=$1 | sed -n '/h2/ p' | sed 's/  */ /g' | sed 's/^ //g' | sed 's/\n//' | sed 's/\(<.*>\)*\(.*\)\(<.*>\)*/\2/g'

But he doesn't want to remove the newline. Can somebody help me? The sed procedures need to be in this order because the message proceeded by h2 is sometimes only 1 line, not two.

sh genscript http:// www . w3c . org   :2 lines
sh genscript http:// www . google. com   :1 line

thanks guys!

DGPickett · November 7, 2010, 9:46am

You might h, remove the newline, validate, g? Validate a copy without the newline? Not sure the problem. To get \n into the buffer in the first place, you need a loop if not </h2> then N.

sed '
  /h2[ >]/!/d
  :loop
  /<\/h2/!{
    N
    b loop
   }
  s/.*<h2\>[^>]*>\(.*\)<\/h2>.*/\1/
 ' check\?uri\=$1

Is new line still a problem? Add s/\n/ /g in a line above.

azertyazerty · November 7, 2010, 4:10pm

thanks!