Reading only particular TAG from XML

Hi,

I have an XML file with following structure. Between following tags I have pipedelimited records with newline characters (Data1|1|2|3)

<![CDATA[
and
]]>

I need to read the data between above tags so that my output is a flat file with pipedelimited records.

<BOS>
<Header>
<TTC>ABC</TTC>
</Header>
<Payload>
<Data>
<![CDATA[ Data1|1|2|3 \n
Data2|4|5|6 \n
--
--
Data3|3|5|6|7 \n
]]>
<Data>
</Payload>
</BOS>

Thanks
DSR

 
awk '/<\!/,/]]/{print}' infile | sed -e 's/\<\!\[CDATA\[ //' -e 's/\]\]\>//'

or....

awk '/\!\[CDATA\[/ , /]]/ {if (!/\!\[CDATA\[/&&!/]]/)print}' file

That was good one but it is ignoring the whole line.
But after, <![CDATA[, there is ,Data1|1|2|3, which need to be displayed.

The command line awk displayed this.....

Data2|4|5|6 \n
--
--
Data3|3|5|6|7 \n

is it not the aim ????

>cat file
<BOS>
<Header>
<TTC>ABC</TTC> 
</Header>
<Payload>
<Data>
<![CDATA[ Data1|1|2|3 \n
Data2|4|5|6 \n
--
--
Data3|3|5|6|7 \n
]]> 
<Data>
</Payload>
</BOS>

If you see the OP infile, you are missing the one which is on the red when you get your result.

Yes, i haven't seen, it's true...

How about this?

awk -F' ' '/<!\[CDATA/,/\]\]>/ {if(!/^\]/)print $(NF>2?2:1)}' file

An alternative to awk, if the file is a well formed XML document, is to transform the file using an XSL stylesheet.

<?xml version="1.0"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">

   <xsl:output method="text"/>

   <xsl:template match="/">
      <xsl:apply-templates select="//Data"/>
   </xsl:template>

   <xsl:template match="Data">
      <xsl:copy-of select="."/>
   </xsl:template>

</xsl:stylesheet>

xsltproc stylesheet file produces

Data1|1|2|3
Data2|4|5|6
Data3|3|5|6|7

Here you go:

sed '/>$/d; s/.*\[ //' myfile.xml

I presume there will always be a space after [ .
If not remove the space.