Hiya,
I am trying to extract a news article from a web page. The sed I have written brings back a lot of Javascript code and sometimes advertisments too. Can anyone please help with this one ??? I need to fix this sed so it picks up the article ONLY (don't worry about the title or date .. i got those using a separate sed) ..
The sed I am running is:
tr -d '\n' <03climate.html | sed -e 's/�//g' -e 's/.*nyt_text[^;];//' -e 's/<\/p>.//g' -e 's/<[^>]>//g' -e s'/[&][#]//g' -e 's/<[^>]>//g' >> articletest
The file I am trying to extract from (03climate.html) and the result (articletest.txt) are both attached with this post ..
Thanks.
SG