Finding specific string in file and storing in another file

Text in input file is like this

<title>
	<band height="21"  isSplitAllowed="true" >
	<staticText>
	<reportElement
				x="1"
				y="1"
				width="313"
				height="20"
				key="staticText-1"/>
	    		<box></box>
				<textElement>
				<font fontName="Arial" pdfFontName="Helvetica-Bold" size="14" isBold="true" isUnderline="true"/>
				</textElement>
		    	<text><![CDATA[**4) Computation of Tier I and Tier II Capital :]**]></text>
				</staticText>
			</band>
		</title>

Output file should have:
4) Computation of Tier I and Tier II Capital :

File have many <title> and [CDATA] tags. but i want to copy text which is under tag <title> under <CDATA> and save its output in another file.

[/CODE]

sed -nr '/\<title/,/\/title/ H; /\/title/{x; s/.*CDATA[^ ]+\s+([^:]+:).*/\1/p}' file >newfile

This seems like XML, but there's an incorrect xml syntax in here: CDATA seems to be not correctly formed. (Is it just added manually to point out the data you want?)

Syntax used here is: [CDATA[**sometext]**]
CDATAs normal syntax is: [CDATA[some text]]

You can fix that with an intermediate file before using an xml parser like that:

sed -e 's/\[CDATA\[\*\*/[CDATA[/' -e 's/\]\*\*\]/]]/' data.xml >data.tmp.xml

 

With the Syntax fixed you may extract the wanted data as follows:

xmllint --nocdata --xpath  "//title/band/staticText/text/text()" data.tmp.xml

or as you likely want to have each result on a seperate line:

xmllint --nocdata --shell  <<<'cat //title/band/staticText/text/text()' data.tmp.xml \
     | grep -vE '^(/ > ?)?( +-+)?$'
   

Note
In parsing XML files with sed/awk one is dropping the advantages of a robust clear text file format and invites errors on any simple whitespace or ordering change(changed whitespace? (un-)compressed output?,...) of the file which is to be expected any time due to the nature of that file format.

--- Post updated at 12:34 PM ---

Hmmmm.... xmlstarlet is more convenient than xmllint:

xmlstarlet sel -t -v "//title/band/staticText/text" data.tmp.xml 

CDATA syntax is correct in file. like this <![CDATA[4) Computation of Tier I and Tier II Capital :]]>

xmllint finally got newlines to separate node sets 4 months ago (on linux / libxml2).

Add newlines to 'xmllint --xpath' output (da35eeae) . Commits . GNOME / libxml2 . GitLab

Maybe some years until it's propageted within the linux distributions.