Could you please give your inputs on the below issue:
source.xml
<?xml version="1.0" encoding="UTF-16"?>
<P1 >
<C1 type="i"><2></C1>
<V1 type="string"><6.2></V1>
<D1 type="string">
<D2><1.0></D2>
<D2><2.0></D2>
</D1>
......................
......................
many more records.....
</P1>
Problem with the above xml is, text is bounded between < & >. I am unable to read the xml. Could you please guide me in how to remove the < & > for the text.
The issue will be determining what is a valid XML tag and what is data that appears between "<" and ">". Is it always numeric? Are there negative numbers? Character strings? With or without spaces?
But making a guess, should the results be:
<?xml version="1.0" encoding="UTF-16"?>
<P1 >
<C1 type="i">2</C1>
<V1 type="string">6.2</V1>
<D1 type="string">
<D2>1.0</D2>
<D2>2.0</D2>
</D1>
......................
......................
many more records.....
</P1>
This was done with:
perl -pe 's{<(\d+(?:\.\d+)?)>}{\1}g;'
or with:
sed -e 's/<\([1-9][0-9]*\)>/\1/g' -e 's/<\([1-9][0-9]*\.[0-9]*\)>/\1/g'
The highlighted token is not needed as any whitespace would have been consumed by ([^>]*) . Also, the two print statements should not have been included. The corrected code is: