saw7
1
hi
i have a file p1.htm
<div class="colorID2">
aaaa aaaa aa <br/>
bbbbbbbb bbb<br/>
<br/>cccc ccc ccc
</div><div class="colorID1">
dddd d ddddd<br/>
eeee eeee eeeeeeeeee<br/>
fffff
<br/>g gg<br/>
</div>
<div ...
output:
aaaa aaaa aa.bbbbbbbb bbb.cccc ccc ccc.dddd d ddddd.eeee eeee eeeeeeeeee.fffff.g gg
my code:
awk -vRS="" '{gsub(/<br/>/,".",$0)}1' p1.htm
but don't work
thank's
Try:
awk '{$1=$1;gsub(/<\/*div[^>]*>/,"");gsub(/ *(<br\/>)+ */,".")}1' RS= ORS= infile
saw7
3
thank's Scrutinizer
---------- Post updated at 04:17 AM ---------- Previous update was at 03:40 AM ----------
Scrutinizer, sorry, can you explain me:
/ *(<br\/>)+ */
---------- Post updated at 04:17 AM ---------- Previous update was at 04:17 AM ----------
Scrutinizer, sorry, can you explain me:
/ *(<br\/>)+ */
what the difference:
/<br\/>/
It means zero or more spaces, followed by 1 or more occurrences of the string <br/> followed by zero or more spaces.
frans
5
sed 's/<[^<]*>//g' infile | tr '\n' ' '
doesn't convert tabs and multiple spaces but can be read by the shell, awk...