Hi,
I have an XML file with following structure. Between following tags I have pipedelimited records with newline characters (Data1|1|2|3)
<![CDATA[
and
]]>
I need to read the data between above tags so that my output is a flat file with pipedelimited records.
<BOS>
<Header>
<TTC>ABC</TTC>
</Header>
<Payload>
<Data>
<![CDATA[ Data1|1|2|3 \n
Data2|4|5|6 \n
--
--
Data3|3|5|6|7 \n
]]>
<Data>
</Payload>
</BOS>
Thanks
DSR
awk '/<\!/,/]]/{print}' infile | sed -e 's/\<\!\[CDATA\[ //' -e 's/\]\]\>//'
or....
awk '/\!\[CDATA\[/ , /]]/ {if (!/\!\[CDATA\[/&&!/]]/)print}' file
That was good one but it is ignoring the whole line.
But after, <![CDATA[, there is ,Data1|1|2|3, which need to be displayed.
The command line awk displayed this.....
Data2|4|5|6 \n
--
--
Data3|3|5|6|7 \n
is it not the aim ????
>cat file
<BOS>
<Header>
<TTC>ABC</TTC>
</Header>
<Payload>
<Data>
<![CDATA[ Data1|1|2|3 \n
Data2|4|5|6 \n
--
--
Data3|3|5|6|7 \n
]]>
<Data>
</Payload>
</BOS>
If you see the OP infile, you are missing the one which is on the red when you get your result.
Yes, i haven't seen, it's true...
ripat
8
How about this?
awk -F' ' '/<!\[CDATA/,/\]\]>/ {if(!/^\]/)print $(NF>2?2:1)}' file
An alternative to awk, if the file is a well formed XML document, is to transform the file using an XSL stylesheet.
<?xml version="1.0"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
<xsl:output method="text"/>
<xsl:template match="/">
<xsl:apply-templates select="//Data"/>
</xsl:template>
<xsl:template match="Data">
<xsl:copy-of select="."/>
</xsl:template>
</xsl:stylesheet>
xsltproc stylesheet file produces
Data1|1|2|3
Data2|4|5|6
Data3|3|5|6|7
Here you go:
sed '/>$/d; s/.*\[ //' myfile.xml
I presume there will always be a space after [ .
If not remove the space.