I only need to take out the text that is between <table name = "hi" > and the corresponding </table>. I need to delete the rest. How do I do that?
I got to the <table name = "hi"> and I deleted lines before that, but I am not able to get to the corresponding </table> as there could be multiple </table> statements.
#!/usr/bin/env ruby -Ku
file=ARGV[0]
require 'hpricot'
doc = open(file){|f|Hpricot(f)}
(doc/"table").each do |x|
print "->#{x}\n" if x.get_attribute("name") == "hi"
end
# cat file
<html>
...
...
...
<table>
.......
......
</table>
<table name = "hi">
text inside hi
</table>
<h1> Welcome </h1>
.......
......
</html>
<table name = "hi">
some more text inside hi
</table>
$ ruby test.rb file
====> <table name="hi">
text inside hi
</table>
====> <table name="hi">
some more text inside hi
</table>