Parsing XML in awk : OFS does not work as expected

Hi,

I am trying to parse regular XML file where I have to reduce number of decimal points in some xml elements. I am using following AWK command to achive that :

#!/bin/ksh

EDITCMD='BEGIN { FS = "[\<\>]"; OFS=FS }
{
if ( $3 ~ "[0-9][0-9]*\\.[0-9][0-9]*" && length(substr($3,1+index($3,"."))) == 15 ) {
PRE=substr($3,1,index($3,".")-1);
POST=substr($3,1+index($3,"."),5);
$3 = PRE "." POST
}
{
print $0
}
}'
nawk "$EDITCMD" /path/file.xml

Problem is, that I can not make the OFS to be correctly print out in the lines where the transformation was applied. Output looks like this:

<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<Import xmlns:xsi="">
<INSTRUMENT>
<INSTRUMENT_CD>00036AAB1</INSTRUMENT_CD>
<BUNDLE_ID>48328</BUNDLE_ID>
<ACCRUAL_DT>5/8/2001</ACCRUAL_DT>
[<>]AMT_ISU[<>]125000000.00000[<>]/AMT_ISU[<>]
<ANNOUNCE_DT>5/1/2001</ANNOUNCE_DT>
<CD_INSTMT_TYPE>UNKNOWN</CD_INSTMT_TYPE>
<CHANGE_DT>5/7/2009 21:02:01.370</CHANGE_DT>
..
..

What am I doing wrong ? FS definition seems to be correct as the transformation is applied to the correct fields/strings, but why the OFS does not hold corresponding FS character when line is been printed out ? It did not help when I escaped, double escaped or did not escaped this characters in FS.

Thanks for your help,

Martin

Try: sub($3,PRE"."POST) instead of $3 = PRE "." POST and then you can leave out OFS=FS

Thanks Scrutinizer, your advise works fine.

However, I would be still interested how to properly use OFS when in FS is regular expression or group of characters and I do not want to change corresponding output separator , just need to access and touch some of the fields.

Any other ideas ?

Thanks & Regards

contrary to FS, OFS does not contain regex, so IMO that would not be possible..

As it does not look like you are validating tags, and that you are reducing any number with 15 significant digits, maybe man sed (linux) be a "better" choice:

sed -e 's/>\([0-9][0-9]*\.[0-9][0-9][0-9][0-9][0-9]\)[0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9]</>\1</g' inputfile

(Yes, I know "-e" is not necessary, but I am one of those boring, make it obvious kind of person)
This way, you don't have to worry if file being changed was formated as shown above, or as:

<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<Import xmlns:xsi="">
<INSTRUMENT><INSTRUMENT_CD>00036AAB1</INSTRUMENT_CD><BUNDLE_ID>48328</BUNDLE_ID><ACCRUAL_DT>5/8/2001</ACCRUAL_DT><AMT_ISU>125000000.123456789012345</AMT_ISU><ANNOUNCE_DT>5/1/2001</ANNOUNCE_DT><CD_INSTMT_TYPE>UNKNOWN</CD_INSTMT_TYPE><CHANGE_DT>5/7/2009 21:02:01.370</CHANGE_DT>...

Now if you need make sure the tags match, you can do change the regex to:

s:<\([^>]*\)>\([0-9][0-9]*\.[0-9][0-9][0-9][0-9][0-9]\)[0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9]</\1>:<\1>\2</\1>:g

Or even list the specific tags you want to change:

s:<\(AMT_ISU\|anothertag\)>\([0-9][0-9]*\.[0-9][0-9][0-9][0-9][0-9]\)[0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9]</\1>:<\1>\2</\1>:g