sed to extract only floating point numbers from HTML

pondlife · September 8, 2009, 11:49am

Hi All,

I'm trying to extract some floating point numbers from within some HTML code like this:

<TR><TD class='awrc'>Parse CPU to Parse Elapsd %:</TD><TD ALIGN='right' class='awrc'>            64.50</TD><TD class='awrc'>% Non-Parse CPU:</TD><TD ALIGN='right' class='awrc'>            99.42</TD></TR>

I've got the following sed but it strips out the spaces and decimal points - how can I modify this command so that I get a space (or any other delimiter) between each and maintain the decimal point?

sed 's/[^0-9]*//g' test.html

Many thanks,

p.

---------- Post updated at 04:49 PM ---------- Previous update was at 04:25 PM ----------

hmm, because these numbers appear more than once per line I'm thinking that sed isn't up to the job - will have a look for another method...

DeepakS · September 8, 2009, 11:53am

Try this:

sed '
s/[^0-9. ]*//g
s/ \+/ /g
' test.html

pondlife · September 9, 2009, 4:10am

Thanks Deepak! Works a treat

I've not used multi-line sed before so messed about a bit and got this to work:

sed -e 's/[^0-9. ]*//g' -e  's/ \+/ /g' test.html