How to trim the zero's after decimal?

Hello all,
I have an XML with below content from which i need to remove the trailing zeros, like 123.00 should be converted to 123 and 123.01200 to 123.012 Below is the sample excerpt data from XML file. My input file size could be approximately 5 GB or less.

CURRENT:

<ACCRUED_INTEREST>0.00</ACCRUED_INTEREST>  
  <BOOK_VALUE>0.00</BOOK_VALUE>  
  <COMMIT_CURRENT>29250.00</COMMIT_CURRENT>  
  <COMMIT_UNDISBURSED>29250.00</COMMIT_UNDISBURSED>  
  <COST_OF_FUNDS>0.0049000000</COST_OF_FUNDS>  
  <DATE_ORIGINATED>2013-05-15</DATE_ORIGINATED>  
  <GOVT_GUARANTOR>0</GOVT_GUARANTOR>  
  <INT_RATE>0.0518000000</INT_RATE>  

EXPECTED:

<ACCRUED_INTEREST>0</ACCRUED_INTEREST>  
  <BOOK_VALUE>0</BOOK_VALUE>  
  <COMMIT_CURRENT>29250</COMMIT_CURRENT>  
  <COMMIT_UNDISBURSED>29250</COMMIT_UNDISBURSED>  
  <COST_OF_FUNDS>0.0049</COST_OF_FUNDS>  
  <DATE_ORIGINATED>2013-05-15</DATE_ORIGINATED>  
  <GOVT_GUARANTOR>0</GOVT_GUARANTOR>  
  <INT_RATE>0.0518</INT_RATE>  

Thank you.

Here's my mini-xml parser again:

$ cat minixml.awk

BEGIN {
        FS=">"; OFS=">";
        RS="<"; ORS="<"
}

{ SPEC=0 ; TAG="" }

NR==1 {
        if(ORS == RS) print;
        next
} # The first "line" is blank when RS=<

/^[!?]/ { SPEC=1    }   # XML specification junk

# Handle open-tags
match($0, /^[^\/ \r\n\t>]+/) {
        TAG=substr(toupper($0), RSTART, RLENGTH);
        if(!SPEC)
        {
                TAGS=TAG "%" TAGS;      DEP++;
                LTAGS=TAGS
        }
}

# Handle close-tags
(!SPEC) && /^[\/]/ {
        sub(/^\//, "", $1);
        LTAGS=TAGS
        sub("^.*" toupper($1) "%", "", TAGS);
        $1="/"$1
        DEP=split(TAGS, TA, "%")-1;
        if(DEP < 0) DEP=0;
}

$ awk -f minixml.awk -e '$2 ~ /^[0-9]*[.][0-9]*$/ { sub(/[.]?0*$/ , "", $2) } 1' input.xml

<ACCRUED_INTEREST>0</ACCRUED_INTEREST>
  <BOOK_VALUE>0</BOOK_VALUE>
  <COMMIT_CURRENT>29250</COMMIT_CURRENT>
  <COMMIT_UNDISBURSED>29250</COMMIT_UNDISBURSED>
  <COST_OF_FUNDS>0.0049</COST_OF_FUNDS>
  <DATE_ORIGINATED>2013-05-15</DATE_ORIGINATED>
  <GOVT_GUARANTOR></GOVT_GUARANTOR>
  <INT_RATE>0.0518</INT_RATE>

$

Thank you i am receiving below error message.

awk -f minixml.awk -e '$2 ~ /^[0-9]*[.][0-9]*$/ { sub(/[.]?0*$/ , "", $2) } 1' input.xml
awk: minixml.awk:3: fatal: cannot open file `-e' for reading (No such file or directory

I thought all awk had -e. Oh well. Stripped it down to a faster program which doesn't need -e:

$ ls -lh filter.xml
-rw-r--r-- 1 user user 451M Sep  3 14:10 filter.xml

$ time awk 'NR==1 { next } NF==2 { sub(/[.]?0*$/ , "", $2) } { print RS $0 }' FS=">" RS="<" OFS=">" ORS="" filter.xml > out.xml

real    1m1.632s
user    0m59.813s
sys     0m1.787s

$ head -n 20 out.xml

<ACCRUED_INTEREST>0</ACCRUED_INTEREST>
  <BOOK_VALUE>0</BOOK_VALUE>
  <COMMIT_CURRENT>29250</COMMIT_CURRENT>
  <COMMIT_UNDISBURSED>29250</COMMIT_UNDISBURSED>
  <COST_OF_FUNDS>0.0049</COST_OF_FUNDS>
  <DATE_ORIGINATED>2013-05-15</DATE_ORIGINATED>
  <GOVT_GUARANTOR></GOVT_GUARANTOR>
  <INT_RATE>0.0518</INT_RATE>
<ACCRUED_INTEREST>0</ACCRUED_INTEREST>
  <BOOK_VALUE>0</BOOK_VALUE>
  <COMMIT_CURRENT>29250</COMMIT_CURRENT>
  <COMMIT_UNDISBURSED>29250</COMMIT_UNDISBURSED>
  <COST_OF_FUNDS>0.0049</COST_OF_FUNDS>
  <DATE_ORIGINATED>2013-05-15</DATE_ORIGINATED>
  <GOVT_GUARANTOR></GOVT_GUARANTOR>
  <INT_RATE>0.0518</INT_RATE>
<ACCRUED_INTEREST>0</ACCRUED_INTEREST>
  <BOOK_VALUE>0</BOOK_VALUE>
  <COMMIT_CURRENT>29250</COMMIT_CURRENT>
  <COMMIT_UNDISBURSED>29250</COMMIT_UNDISBURSED>

$
1 Like

Thanks much it is working for the XML data sample i gave it to you, but i see an issue like for below excerpt of XML file which is complete possible set of tags and attributes in my XML file, when it has multiple attributes in a single tag your awk script is not working. Could you please help me tweaking it?

please check these attributes: ACCEPTABLE_VOL_COUNT="357.000" ACCEPTABLE_VOL_DOLLARS="71829447.08000" .

Excerpt:

<Provider>
<Institution ACCEPTABLE_VOL_COUNT="357.000" ACCEPTABLE_VOL_DOLLARS="71829447.08000" ACCRUED_INTEREST_COUNT="344" ACCRUED_INTEREST_DOLLARS="299979.26" BEGINNING_FARMER_FLAG_COUNT="244" BOOK_VALUE_COUNT="370" BOOK_VALUE_DOLLARS="75554816.98" CUSTOMER_ROW_COUNT="330" DOUBTFUL_VOL_COUNT="0" DOUBTFUL_VOL_DOLLARS="0.00" EXTRACT_DATE="2014-06-30" LOAN_ROW_COUNT="389" OAEM_VOL_COUNT="3" OAEM_VOL_DOLLARS="267446.68" PAST_DUE_AMOUNT_COUNT="12" PAST_DUE_AMOUNT_DOLLARS="2625411.64" PD_RATING_COUNT="389" PD_RATING_VALUES="2508" PRINCIPAL_BALANCE_COUNT="369" PRINCIPAL_BALANCE_DOLLARS="75254837.72" SMALL_FARMER_FLAG_COUNT="286" SUBSTANDARD_VOL_COUNT="10" SUBSTANDARD_VOL_DOLLARS="3457923.22" UNINUM="xxxxxx" YOUNG_FARMER_FLAG_COUNT="31">
        <Customer CIF="xxxxx">
        <BORROWER_NAME>xxxxx</BORROWER_NAME>
        <FIPS_CODE>15003</FIPS_CODE>
        <RELATED_PARTY_LOAN_CODE>0</RELATED_PARTY_LOAN_CODE>
        <DEBT_REPAYMENT_COVERAGE_RATIO>3.0000000000</DEBT_REPAYMENT_COVERAGE_RATIO>
        <CURRENT_ASSETS>1112799.00</CURRENT_ASSETS>
        <CURRENT_LIABILITIES>121482.00</CURRENT_LIABILITIES>
        <FARM_OPS_EXP>563390.00</FARM_OPS_EXP>
        <GROSS_AG_INC>593480.00</GROSS_AG_INC>
        <INT_EXP>17590.00</INT_EXP>
        <NON_CURR_ASSET>3285500.00</NON_CURR_ASSET>
        <NON_CURR_LIABILITIES>529347.00</NON_CURR_LIABILITIES>
        <NET_AG_INC>30090.00</NET_AG_INC>
        <NET_INC>194677.00</NET_INC>
        <NET_WORTH>3747470.00</NET_WORTH>
        <NONFARM_INC>164587.00</NONFARM_INC>
        <TOTAL_ASSETS>4398299.00</TOTAL_ASSETS>
        <TOTAL_LIABILITIES>650829.00</TOTAL_LIABILITIES>
        <DEBT_SERVICE_REQUIREMENT>46617.00</DEBT_SERVICE_REQUIREMENT>
        <REPAYMENT_SOURCE>1</REPAYMENT_SOURCE>
        <COST_OF_FUNDS>0.0049000000</COST_OF_FUNDS>
        </customer>
</Institution>
</provider>

Output:

<Provider>
<Institution ACCEPTABLE_VOL_COUNT="357.000" ACCEPTABLE_VOL_DOLLARS="71829447.08000" ACCRUED_INTEREST_COUNT="344" ACCRUED_INTEREST_DOLLARS="299979.26" BEGINNING_FARMER_FLAG_COUNT="244" BOOK_VALUE_COUNT="370" BOOK_VALUE_DOLLARS="75554816.98" CUSTOMER_ROW_COUNT="330" DOUBTFUL_VOL_COUNT="0" DOUBTFUL_VOL_DOLLARS="0.00" EXTRACT_DATE="2014-06-30" LOAN_ROW_COUNT="389" OAEM_VOL_COUNT="3" OAEM_VOL_DOLLARS="267446.68" PAST_DUE_AMOUNT_COUNT="12" PAST_DUE_AMOUNT_DOLLARS="2625411.64" PD_RATING_COUNT="389" PD_RATING_VALUES="2508" PRINCIPAL_BALANCE_COUNT="369" PRINCIPAL_BALANCE_DOLLARS="75254837.72" SMALL_FARMER_FLAG_COUNT="286" SUBSTANDARD_VOL_COUNT="10" SUBSTANDARD_VOL_DOLLARS="3457923.22" UNINUM="xxxxxx" YOUNG_FARMER_FLAG_COUNT="31">
        <Customer CIF="xxxxx">
        <BORROWER_NAME>xxxxx</BORROWER_NAME>
        <FIPS_CODE>15003</FIPS_CODE>
        <RELATED_PARTY_LOAN_CODE></RELATED_PARTY_LOAN_CODE>
        <DEBT_REPAYMENT_COVERAGE_RATIO>3</DEBT_REPAYMENT_COVERAGE_RATIO>
        <CURRENT_ASSETS>1112799</CURRENT_ASSETS>
        <CURRENT_LIABILITIES>121482</CURRENT_LIABILITIES>
        <FARM_OPS_EXP>563390</FARM_OPS_EXP>
        <GROSS_AG_INC>593480</GROSS_AG_INC>
        <INT_EXP>17590</INT_EXP>
        <NON_CURR_ASSET>3285500</NON_CURR_ASSET>
        <NON_CURR_LIABILITIES>529347</NON_CURR_LIABILITIES>
        <NET_AG_INC>30090</NET_AG_INC>
        <NET_INC>194677</NET_INC>
        <NET_WORTH>3747470</NET_WORTH>
        <NONFARM_INC>164587</NONFARM_INC>
        <TOTAL_ASSETS>4398299</TOTAL_ASSETS>
        <TOTAL_LIABILITIES>650829</TOTAL_LIABILITIES>
        <DEBT_SERVICE_REQUIREMENT>46617</DEBT_SERVICE_REQUIREMENT>
        <REPAYMENT_SOURCE>1</REPAYMENT_SOURCE>
        <COST_OF_FUNDS>0.0049</COST_OF_FUNDS>
        </customer>
</Institution>
</provider>

You didn't ask for that, so I didn't think to do so.

This requires a more complicated expression which won't work in awk. Perl can do it though, and turns out to be faster:

$ ls -lh filter.xml
-rw-r--r-- 1 user user 905M Sep  5 13:53 filter.xml

$ time perl -074 -p -e "s/[.]?0*([<'\"])/\\1/g;" filter.xml > output.xml


real    1m57.446s
user    1m52.417s
sys     0m3.447s

$

Another awk for the original question:

awk '$2+0==$2 {$2+=0} {$0=RS $0}NR>1' RS=\< ORS= FS=\> OFS=\> file

Not sure if I understood the requirement to its entirety, but give this a try:

sed -r 's/(\.[0-9]*[1-9])0*([^0-9])/\1\2/g;s/\.0*([^0-9])/\1/g' file

Thanks again, though your code worked but i still see a problem with one of the tags as highlighted below, your code is removing the zero if it is a single value by itself in a single tag.

<Provider>
<Institution ACCEPTABLE_VOL_COUNT="357" ACCEPTABLE_VOL_DOLLARS="71829447.08" ACCRUED_INTEREST_COUNT="344" ACCRUED_INTEREST_DOLLARS="299979.26" BEGINNING_FARMER_FLAG_COUNT="244" BOOK_VALUE_COUNT="37" BOOK_VALUE_DOLLARS="75554816.98" CUSTOMER_ROW_COUNT="33" DOUBTFUL_VOL_COUNT="" DOUBTFUL_VOL_DOLLARS="0" EXTRACT_DATE="2014-06-3" LOAN_ROW_COUNT="389" OAEM_VOL_COUNT="3" OAEM_VOL_DOLLARS="267446.68" PAST_DUE_AMOUNT_COUNT="12" PAST_DUE_AMOUNT_DOLLARS="2625411.64" PD_RATING_COUNT="389" PD_RATING_VALUES="2508" PRINCIPAL_BALANCE_COUNT="369" PRINCIPAL_BALANCE_DOLLARS="75254837.72" SMALL_FARMER_FLAG_COUNT="286" SUBSTANDARD_VOL_COUNT="1" SUBSTANDARD_VOL_DOLLARS="3457923.22" UNINUM="xxxxxx" YOUNG_FARMER_FLAG_COUNT="31">
        <Customer CIF="xxxxx">
        <BORROWER_NAME>xxxxx</BORROWER_NAME>
        <FIPS_CODE>15003</FIPS_CODE>
        <RELATED_PARTY_LOAN_CODE></RELATED_PARTY_LOAN_CODE>
        <DEBT_REPAYMENT_COVERAGE_RATIO>3</DEBT_REPAYMENT_COVERAGE_RATIO>
        <CURRENT_ASSETS>1112799</CURRENT_ASSETS>
        <CURRENT_LIABILITIES>121482</CURRENT_LIABILITIES>
        <FARM_OPS_EXP>563390</FARM_OPS_EXP>
        <GROSS_AG_INC>593480</GROSS_AG_INC>
        <INT_EXP>17590</INT_EXP>
        <NON_CURR_ASSET>3285500</NON_CURR_ASSET>
        <NON_CURR_LIABILITIES>529347</NON_CURR_LIABILITIES>
        <NET_AG_INC>30090</NET_AG_INC>
        <NET_INC>194677</NET_INC>
        <NET_WORTH>3747470</NET_WORTH>
        <NONFARM_INC>164587</NONFARM_INC>
        <TOTAL_ASSETS>4398299</TOTAL_ASSETS>
        <TOTAL_LIABILITIES>650829</TOTAL_LIABILITIES>
        <DEBT_SERVICE_REQUIREMENT>46617</DEBT_SERVICE_REQUIREMENT>
        <REPAYMENT_SOURCE>1</REPAYMENT_SOURCE>
        <COST_OF_FUNDS>0.0049</COST_OF_FUNDS>
        </customer>
</Institution>
</provider>

Does it? Not for me:

. . .
<RELATED_PARTY_LOAN_CODE>0</RELATED_PARTY_LOAN_CODE>
. . .
1 Like

sorry i was referring to Corona688 message, i edited and quoted the message to avoid confusion. yes your code is working fine and gives output as expected, will let you know if i find any thing else. thank you.