Remove space from numeric value

Jopsulainen · April 7, 2016, 6:30am

Hello,

I need help. I have xml file and there are one extra space on number <EpiReference>1 42345</EpiReference> . And of cource, the value change on every new file. I need remove space from that value what is in between <EpiReference> and </EpiReference> . How I can do that?

This are example file:

<EpiDetails><EpiIdentificationDetails><EpiDate Format="CCYYMMDD">20160331</EpiDate><EpiReference>1 59760</EpiReference></EpiIdentificationDetails><EpiPartyDetails><EpiBfiPartyDetails><EpiBfiIdentifier IdentificationSchemeName="BIC">NDEAHH</EpiBfiIdentifier></EpiBfiPartyDetails><EpiBeneficiaryPartyDetails><EpiNameAddressDetails>Sakun Koivo, Oati Ay</EpiNameAddressDetails><EpiBei>212367-2</EpiBei><EpiAccountID IdentificationSchemeName="IBAN">FI741023144210348960</EpiAccountID>

need to be

<EpiDetails><EpiIdentificationDetails><EpiDate Format="CCYYMMDD">20160331</EpiDate><EpiReference>159760</EpiReference></EpiIdentificationDetails><EpiPartyDetails><EpiBfiPartyDetails><EpiBfiIdentifier IdentificationSchemeName="BIC">NDEAHH</EpiBfiIdentifier></EpiBfiPartyDetails><EpiBeneficiaryPartyDetails><EpiNameAddressDetails>Sakun Koivo, Oati Ay</EpiNameAddressDetails><EpiBei>212367-2</EpiBei><EpiAccountID IdentificationSchemeName="IBAN">FI741023144210348960</EpiAccountID>

Thanks!

RudiC · April 7, 2016, 7:18am

Any attempts/ideas/thoughts from your side?

Jopsulainen · April 7, 2016, 7:40am

No, I don't. I know how i find the value and how to change this, but
it's change only this value. Everything else from file disappear.

cat $1 | while read EPI
do
echo $EPI | sed -n 's:.*<EpiReference>\(.*\)</EpiReference>.*:\1:p' |awk '{ gsub (" ", "", $0); print}'
done

Sorry, I'm stupid and newbie for the Unix coding

RudiC · April 7, 2016, 8:01am

Try this (quick and dirty):

awk '
match ($0, /<EpiReference>[^<]*<\/EpiReference>/)       {T1 = T2 = substr ($0, RSTART+14, RLENGTH-29)
                                                         gsub (" ", "", T1)
                                                         sub (T2, T1) 
                                                        }
1
' file

Jopsulainen · April 7, 2016, 8:17am

rudic:

Try this (quick and dirty):

awk '
match ($0, /<EpiReference>[^<]*<\/EpiReference>/)       {T1 = T2 = substr ($0, RSTART+14, RLENGTH-29)
   gsub (" ", "", T1)
   sub (T2, T1) 
   }
1
' file

Thanks, but unfortunately this did not work. The output is the same than input. Punctuation is still there

RudiC · April 7, 2016, 8:29am

Not for me. Before

<EpiReference>1 59760</EpiReference>

After

<EpiReference>159760</EpiReference>

And, punctuation was not mentioned in post#1 when describing the problem.

jim_mcnamara · April 7, 2016, 11:35am

Or how about the other way around: give us us an example of your expected input and the result. Rudi's answer was based on your code, not on what you want your code to to do.

shamrock · April 7, 2016, 2:07pm

jopsulainen:

No, I don't. I know how i find the value and how to change this, but
it's change only this value. Everything else from file disappear.
cat $1 | while read EPI
do
echo $EPI | sed -n 's:.*<EpiReference>$.*$</EpiReference>.*:\1:p' |awk '{ gsub (" ", "", $0); print}'
done

sed can do it all without the overhead of catenating the input file and looping through it a line at a time...

sed '1,$ s;\(<EpiReference>[0-9][0-9]*\)  *\([0-9][0-9]*</EpiReference>\);\1\2;g' file

rdrtx1 · April 7, 2016, 2:38pm

try also:

awk '
{
   l=$0;
   sub(".*<EpiReference>", "", l);
   sub("</EpiReference>.*", "", l);
   n=l;
   gsub(" ", "", n);
   sub("<EpiReference>" l "</EpiReference>", "<EpiReference>" n "</EpiReference>");
   print
}' infile

Jopsulainen · April 8, 2016, 5:07am

rdrtx1:

try also:

awk '
{
   l=$0;
   sub(".*<EpiReference>", "", l);
   sub("</EpiReference>.*", "", l);
   n=l;
   gsub(" ", "", n);
   sub("<EpiReference>" l "</EpiReference>", "<EpiReference>" n "</EpiReference>");
   print
}' infile

Thank you very much. This works great. I got some good tips from everyone, thank you!