Reading only particular TAG from XML

dsrookie · September 10, 2009, 7:31pm

Hi,

I have an XML file with following structure. Between following tags I have pipedelimited records with newline characters (Data1|1|2|3)

<![CDATA[
and
]]>

I need to read the data between above tags so that my output is a flat file with pipedelimited records.

Thanks
DSR

malcomex999 · September 12, 2009, 6:50am

 
awk '/<\!/,/]]/{print}' infile | sed -e 's/\<\!\[CDATA\[ //' -e 's/\]\]\>//'

protocomm · September 12, 2009, 7:52am

or....

awk '/\!\[CDATA\[/ , /]]/ {if (!/\!\[CDATA\[/&&!/]]/)print}' file

malcomex999 · September 12, 2009, 8:05am

That was good one but it is ignoring the whole line.
But after, <![CDATA[, there is ,Data1|1|2|3, which need to be displayed.

protocomm · September 12, 2009, 8:12am

The command line awk displayed this.....

Data2|4|5|6 \n
--
--
Data3|3|5|6|7 \n

is it not the aim ????

malcomex999 · September 12, 2009, 8:19am

>cat file
<BOS>
<Header>
<TTC>ABC</TTC> 
</Header>
<Payload>
<Data>
<![CDATA[ Data1|1|2|3 \n
Data2|4|5|6 \n
--
--
Data3|3|5|6|7 \n
]]> 
<Data>
</Payload>
</BOS>

If you see the OP infile, you are missing the one which is on the red when you get your result.

protocomm · September 12, 2009, 9:04am

Yes, i haven't seen, it's true...

ripat · September 12, 2009, 9:37am

How about this?

awk -F' ' '/<!\[CDATA/,/\]\]>/ {if(!/^\]/)print $(NF>2?2:1)}' file

fpmurphy · September 12, 2009, 11:51am

An alternative to awk, if the file is a well formed XML document, is to transform the file using an XSL stylesheet.

<?xml version="1.0"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">

   <xsl:output method="text"/>

   <xsl:template match="/">
      <xsl:apply-templates select="//Data"/>
   </xsl:template>

   <xsl:template match="Data">
      <xsl:copy-of select="."/>
   </xsl:template>

</xsl:stylesheet>

xsltproc stylesheet file produces

Data1|1|2|3
Data2|4|5|6
Data3|3|5|6|7

edidataguy · September 12, 2009, 8:40pm

Here you go:

sed '/>$/d; s/.*\[ //' myfile.xml

I presume there will always be a space after [ .
If not remove the space.