Hello,
I need help. I have xml file and there are one extra space on number <EpiReference>1 42345</EpiReference>
. And of cource, the value change on every new file. I need remove space from that value what is in between <EpiReference>
and </EpiReference>
. How I can do that?
This are example file:
<EpiDetails><EpiIdentificationDetails><EpiDate Format="CCYYMMDD">20160331</EpiDate><EpiReference>1 59760</EpiReference></EpiIdentificationDetails><EpiPartyDetails><EpiBfiPartyDetails><EpiBfiIdentifier IdentificationSchemeName="BIC">NDEAHH</EpiBfiIdentifier></EpiBfiPartyDetails><EpiBeneficiaryPartyDetails><EpiNameAddressDetails>Sakun Koivo, Oati Ay</EpiNameAddressDetails><EpiBei>212367-2</EpiBei><EpiAccountID IdentificationSchemeName="IBAN">FI741023144210348960</EpiAccountID>
need to be
<EpiDetails><EpiIdentificationDetails><EpiDate Format="CCYYMMDD">20160331</EpiDate><EpiReference>159760</EpiReference></EpiIdentificationDetails><EpiPartyDetails><EpiBfiPartyDetails><EpiBfiIdentifier IdentificationSchemeName="BIC">NDEAHH</EpiBfiIdentifier></EpiBfiPartyDetails><EpiBeneficiaryPartyDetails><EpiNameAddressDetails>Sakun Koivo, Oati Ay</EpiNameAddressDetails><EpiBei>212367-2</EpiBei><EpiAccountID IdentificationSchemeName="IBAN">FI741023144210348960</EpiAccountID>
Thanks!
RudiC
April 7, 2016, 7:18am
2
Any attempts/ideas/thoughts from your side?
No, I don't. I know how i find the value and how to change this, but
it's change only this value. Everything else from file disappear.
cat $1 | while read EPI
do
echo $EPI | sed -n 's:.*<EpiReference>\(.*\)</EpiReference>.*:\1:p' |awk '{ gsub (" ", "", $0); print}'
done
Sorry, I'm stupid and newbie for the Unix coding
RudiC
April 7, 2016, 8:01am
4
Try this (quick and dirty):
awk '
match ($0, /<EpiReference>[^<]*<\/EpiReference>/) {T1 = T2 = substr ($0, RSTART+14, RLENGTH-29)
gsub (" ", "", T1)
sub (T2, T1)
}
1
' file
rudic:
Try this (quick and dirty):
awk '
match ($0, /<EpiReference>[^<]*<\/EpiReference>/) {T1 = T2 = substr ($0, RSTART+14, RLENGTH-29)
gsub (" ", "", T1)
sub (T2, T1)
}
1
' file
Thanks, but unfortunately this did not work. The output is the same than input. Punctuation is still there
RudiC
April 7, 2016, 8:29am
6
Not for me. Before
<EpiReference>1 59760</EpiReference>
After
<EpiReference>159760</EpiReference>
And, punctuation was not mentioned in post#1 when describing the problem.
Or how about the other way around: give us us an example of your expected input and the result. Rudi's answer was based on your code, not on what you want your code to to do.
jopsulainen:
No, I don't. I know how i find the value and how to change this, but
it's change only this value. Everything else from file disappear.
cat $1 | while read EPI
do
echo $EPI | sed -n 's:.*<EpiReference>\(.*\)</EpiReference>.*:\1:p' |awk '{ gsub (" ", "", $0); print}'
done
sed can do it all without the overhead of catenating the input file and looping through it a line at a time...
sed '1,$ s;\(<EpiReference>[0-9][0-9]*\) *\([0-9][0-9]*</EpiReference>\);\1\2;g' file
rdrtx1
April 7, 2016, 2:38pm
9
try also:
awk '
{
l=$0;
sub(".*<EpiReference>", "", l);
sub("</EpiReference>.*", "", l);
n=l;
gsub(" ", "", n);
sub("<EpiReference>" l "</EpiReference>", "<EpiReference>" n "</EpiReference>");
print
}' infile
rdrtx1:
try also:
awk '
{
l=$0;
sub(".*<EpiReference>", "", l);
sub("</EpiReference>.*", "", l);
n=l;
gsub(" ", "", n);
sub("<EpiReference>" l "</EpiReference>", "<EpiReference>" n "</EpiReference>");
print
}' infile
Thank you very much. This works great. I got some good tips from everyone, thank you!