Reading only particular TAG from XML


I have an XML file with following structure. Between following tags I have pipedelimited records with newline characters (Data1|1|2|3)


I need to read the data between above tags so that my output is a flat file with pipedelimited records.

<![CDATA[ Data1|1|2|3 \n
Data2|4|5|6 \n
Data3|3|5|6|7 \n


awk '/<\!/,/]]/{print}' infile | sed -e 's/\<\!\[CDATA\[ //' -e 's/\]\]\>//'


awk '/\!\[CDATA\[/ , /]]/ {if (!/\!\[CDATA\[/&&!/]]/)print}' file

That was good one but it is ignoring the whole line.
But after, <![CDATA[, there is ,Data1|1|2|3, which need to be displayed.

The command line awk displayed this.....

Data2|4|5|6 \n
Data3|3|5|6|7 \n

is it not the aim ????

>cat file
<![CDATA[ Data1|1|2|3 \n
Data2|4|5|6 \n
Data3|3|5|6|7 \n

If you see the OP infile, you are missing the one which is on the red when you get your result.

Yes, i haven't seen, it's true...

How about this?

awk -F' ' '/<!\[CDATA/,/\]\]>/ {if(!/^\]/)print $(NF>2?2:1)}' file

An alternative to awk, if the file is a well formed XML document, is to transform the file using an XSL stylesheet.

<?xml version="1.0"?>
<xsl:stylesheet xmlns:xsl="" version="1.0">

   <xsl:output method="text"/>

   <xsl:template match="/">
      <xsl:apply-templates select="//Data"/>

   <xsl:template match="Data">
      <xsl:copy-of select="."/>


xsltproc stylesheet file produces


Here you go:

sed '/>$/d; s/.*\[ //' myfile.xml

I presume there will always be a space after [ .
If not remove the space.