Remove space from numeric value

Hello,

I need help. I have xml file and there are one extra space on number <EpiReference>1 42345</EpiReference> . And of cource, the value change on every new file. I need remove space from that value what is in between <EpiReference> and </EpiReference> . How I can do that?

This are example file:

<EpiDetails><EpiIdentificationDetails><EpiDate Format="CCYYMMDD">20160331</EpiDate><EpiReference>1 59760</EpiReference></EpiIdentificationDetails><EpiPartyDetails><EpiBfiPartyDetails><EpiBfiIdentifier IdentificationSchemeName="BIC">NDEAHH</EpiBfiIdentifier></EpiBfiPartyDetails><EpiBeneficiaryPartyDetails><EpiNameAddressDetails>Sakun Koivo, Oati Ay</EpiNameAddressDetails><EpiBei>212367-2</EpiBei><EpiAccountID IdentificationSchemeName="IBAN">FI741023144210348960</EpiAccountID>

need to be

<EpiDetails><EpiIdentificationDetails><EpiDate Format="CCYYMMDD">20160331</EpiDate><EpiReference>159760</EpiReference></EpiIdentificationDetails><EpiPartyDetails><EpiBfiPartyDetails><EpiBfiIdentifier IdentificationSchemeName="BIC">NDEAHH</EpiBfiIdentifier></EpiBfiPartyDetails><EpiBeneficiaryPartyDetails><EpiNameAddressDetails>Sakun Koivo, Oati Ay</EpiNameAddressDetails><EpiBei>212367-2</EpiBei><EpiAccountID IdentificationSchemeName="IBAN">FI741023144210348960</EpiAccountID>

Thanks!

Any attempts/ideas/thoughts from your side?

No, I don't. I know how i find the value and how to change this, but
it's change only this value. Everything else from file disappear.

cat $1 | while read EPI
do
echo $EPI | sed -n 's:.*<EpiReference>\(.*\)</EpiReference>.*:\1:p' |awk '{ gsub (" ", "", $0); print}'
done

Sorry, I'm stupid and newbie for the Unix coding :frowning:

Try this (quick and dirty):

awk '
match ($0, /<EpiReference>[^<]*<\/EpiReference>/)       {T1 = T2 = substr ($0, RSTART+14, RLENGTH-29)
                                                         gsub (" ", "", T1)
                                                         sub (T2, T1) 
                                                        }
1
' file

Thanks, but unfortunately this did not work. The output is the same than input. Punctuation is still there :frowning:

Not for me. Before

<EpiReference>1 59760</EpiReference>

After

<EpiReference>159760</EpiReference>

And, punctuation was not mentioned in post#1 when describing the problem.

Or how about the other way around: give us us an example of your expected input and the result. Rudi's answer was based on your code, not on what you want your code to to do.

sed can do it all without the overhead of catenating the input file and looping through it a line at a time...

sed '1,$ s;\(<EpiReference>[0-9][0-9]*\)  *\([0-9][0-9]*</EpiReference>\);\1\2;g' file

try also:

awk '
{
   l=$0;
   sub(".*<EpiReference>", "", l);
   sub("</EpiReference>.*", "", l);
   n=l;
   gsub(" ", "", n);
   sub("<EpiReference>" l "</EpiReference>", "<EpiReference>" n "</EpiReference>");
   print
}' infile

Thank you very much. This works great. I got some good tips from everyone, thank you!