Removing html tags

SkySmart · May 17, 2012, 10:20am

I store different variance of the below in an xml file. and apparently, xml has an issue loading up data like this because it contains html tags. i would like to preserve this data as it is, but unfortunately, xml says i cant.

so i have to strip out all the html tags.

the examples i found online tend to strip out the tags AND also the content, which i dont want. i need the content. i just want to get rid of the HTML tags.

below is the code:

echo "OK: strings of [ <bye xmlns=http://check.google.com/schema/2 serial-number= ] was found in the response
from the URL [ http://sky.net:80/schock/google/pinging ]. 
Actual response received = [ <bye xmlns=http://check.google.com/schema/2 serial-number=fafafafafaf /> ]." | sed 's/<[^>]*>//g'

Response:

 OK: strings of [  ].

in2nix4life · May 17, 2012, 10:35am

Not sure what you want for your desired output, but this will get rid of the "<>" tags:

echo "OK: strings of [ <bye xmlns=http://check.google.com/schema/2 serial-number= ] was found in the response from the URL [ http://sky.net:80/schock/google/pinging ]. Actual response received = [ <bye xmlns=http://check.google.com/schema/2 serial-number=fafafafafaf /> ]." | sed "s|[<|>]||g"

OK: strings of [ bye xmlns=http://check.google.com/schema/2 serial-number= ] was found in the response from the URL [ http://sky.net:80/schock/google/pinging ]. Actual response received = [ bye xmlns=http://check.google.com/schema/2 serial-number=fafafafafaf / ].

Could you post your expected result?

Scrutinizer · May 17, 2012, 10:37am

Could you show what the output would need to look like?

SkySmart · May 17, 2012, 10:48am

in2nix4life:

Not sure what you want for your desired output, but this will get rid of the "<>" tags:

echo "OK: strings of [ <bye xmlns=http://check.google.com/schema/2 serial-number= ] was found in the response from the URL [ http://sky.net:80/schock/google/pinging ]. Actual response received = [ <bye xmlns=http://check.google.com/schema/2 serial-number=fafafafafaf /> ]." | sed "s|[<|>]||g"

OK: strings of [ bye xmlns=http://check.google.com/schema/2 serial-number= ] was found in the response from the URL [ http://sky.net:80/schock/google/pinging ]. Actual response received = [ bye xmlns=http://check.google.com/schema/2 serial-number=fafafafafaf / ].

Could you post your expected result?

im gonna try to this to see if it works. cant post any output because the output wont be the same. i have hundreds of data similar to this. and they're not all similar. i just want to keep the html tags from being a nuisance.

microsoft excel cant load up an xml file if it has these html tags in them.

Scrutinizer · May 17, 2012, 10:55am

We meant what output would need to look like with your input sample.

SkySmart · May 17, 2012, 11:02am

i would like the output to look similar to this:

OK: strings of [ bye xmlns=http://check.google.com/schema/2 serial-number= ] was found in the response from the URL [ http://sky.net:80/schock/google/pinging ]. Actual response received = [ bye xmlns=http://check.google.com/schema/2 serial-number=fafafafafaf / ].

Corona688 · May 17, 2012, 12:41pm

So you just want to delete the <> characters?

tr -d '<>' < inputfile > outputfile

SkySmart · May 17, 2012, 3:25pm

i guess it would help if i stated what my purpose of this is.

ok, the output above needs to be able placed in an xml file. below is the field of the xml file in which i place each output into:

  <Cell ss:StyleID="s63"><Data ss:Type="String">$Output</Data></Cell>

so the variable called "$Output", contains the information that i'm trying to strip html tags from.

since there are so many characters that can be in an html tag, i dont know what characters will be in all the information that will be placed in the "$Output" variable.

so i'm hoping there's a sed/awk one liner that can take into account all html tags/characters, and remove them from my output. i'm presuming someone here would know about html and all the tags it has.

Corona688 · May 17, 2012, 3:27pm

Always.

<Cell ss:StyleID="s63"><Data ss:Type="String">${Output//[<>]/}</Data></Cell>

perhaps?

michaelrozar17 · May 18, 2012, 2:35am

skysmart:


echo "OK: strings of [ <bye xmlns=http://check.google.com/schema/2 serial-number= ] was found in the response
from the URL [ http://sky.net:80/schock/google/pinging ]. 
Actual response received = [ <bye xmlns=http://check.google.com/schema/2 serial-number=fafafafafaf /> ]." | sed 's/<[^>]*>//g'
Response:
 OK: strings of [  ].

Also try as..

echo "OK: ...." | sed 's/[<>]//g'