Hi All,
I have an xml file with the below format.
<a>111</a><b>222</b><c>333<c><d><e>123</e><f>234</f><d><e>456</e><f>789</f>
output needed is
111,222,333,123,234
111,222,333,456,789
nawk 'BEGIN{FS="<|>"}
{print a,b,c,e,f
a=""
b=""
e=""
f=""
}
{for(i=1;i<=NF;i++) {if($i=="a"){a=$(i+1);continue}}}
{for(i=1;i<=NF;i++) {if($i=="b"){b=$(i+1); continue}}}
{for(i=1;i<=NF;i++) {if($i=="c"){d=$(i+1); continue}}}
{for(i=1;i<=NF;i++) {if($i=="e"){d=$(i+1); continue}}}
{for(i=1;i<=NF;i++) {if($i=="f"){d=$(i+1); continue}}}
END {print a,b,c,e,f}' file
However,
the output that I have is
111,222,333,456,789
ANy one have any idea?
clx
2
lots of threads are available regarding this.
please use search.
Trick is in using the right tool for the right job.
There are modules already available in CPAN for xml parsing and creating xml stuff. Try them instead!
Unfortunately, if you look closely at the string, it is not valid XML as it is not well-formed. No XML parser is going to handle this string.
---------- Post updated at 10:21 AM ---------- Previous update was at 09:06 AM ----------
One way of doing it would be to use a mix of sed and awk to parse and process the line
sed 's/\<d\>/|/g' file | sed 's/\<\/.\>/ /g' | sed 's/\<.\>//g' | sed 's/ \(.\)/,\1/g' | \
sed 's/,|/|/g' | awk -F'|' '{ printf "%s,%s\n", $1, $2; printf "%s,%s\n", $1, $3 }'
This outputs
111,222,333,123,234
111,222,333,456,789
Not elegant but it works!
$_='<a>111</a><b>222</b><c>333<c><d><e>123</e><f>234</f><d><e>456</e><f>789</f>';
my @tmp=$_=~/[0-9]+/g;
my @a1=@tmp[0..4];
my @a2=@tmp[0..2,5,6];
print join ",", @a1;
print "\n";
print join ",",@a2;
Hi anchal _khare,matrixmadhan,fpmurphy,summer_cherry
Thank you very much for your help!!
---------- Post updated at 05:01 AM ---------- Previous update was at 05:00 AM ----------
Hi summer_cherry,
This is the perl script?
Thanks.
Another way with awk...
awk -F"<d>" '{print $1","$2,"\n"$1","$3}' f1 | tr -d '<[a-z]>' | tr '\/' ','
ripat
8
To the OP: are you sure of the XML data. If you look carefully, some closing tag are missing.
<a>111</a><b>222</b><c>333<c><d><e>123</e><f>234</f><d><e>456</e><f>789</f>
<a>111</a><b>222</b><c>333</c><d><e>123</e><f>234</f></d><e>456</e><f>789</f>
If I recall, some clever awk fans have developed XML parser modules. I will have a look and post the link if I find it back.
---------- Post updated at 06:00 PM ---------- Previous update was at 05:53 PM ----------
Here you go:
awk.info