viki
April 12, 2007, 10:12am
1
Hi,
I need to parse the following XML data enclosed in <a> </a> XML tag using shell script.
<X>
.....
</X>
<a>
<b>
<c>data1</c>
<c>data2</c>
</b>
<d>
<c>data3</c>
</d>
</a>
<XX>
...
</XX>
Further I need to display the data in the following format
b
data1
data2
-----
d
data3
Could any body suggest a way to extract the data residing <a> </a> XML tags.
TIA,
Viki
Viki,
See if this would solve your problem:
sed -e 's!</.>!!' -e 's!<.>!!' xml_file
anbu23
April 13, 2007, 12:46am
3
sed -n "/<a>/,/<\/a>/{/<\/*a>/d;s/^<\([^>]*\)>\([^<]*\)<\/\1>/\2/;s/^<\/.*$/--------------/;s/<\(.*\)>/\1/;p;}" file
viki
April 13, 2007, 2:18am
4
Hi anbu23,
Thanks for quick reply . I am getting the following output with the suggested 'sed command'.
> sed -n "/<a>/,/<\/a>/{/<\/*a>/d;s/^<\([^>]*\)>\([^<]*\)<\/\1>/\2/;s/^<\/.*$/--------------/;s/<\(.*\)>/\1/;p;}" c.xml
b
c>data1</c
c>data2</c
/b
d
c>data3</c
/d
where c.xml contain the following data.
> cat c.xml
<X>
.....
</X>
<a>
<b>
<c>data1</c>
<c>data2</c>
</b>
<d>
<c>data3</c>
</d>
</a>
<XX>
...
</XX>
The issue is to extract the XML tags i.e. "b" and "d" and then read the XML tag <c>.
Further store the data in a text file in the following format
b:data1 data2
d:data3
Could you please help me out.
TIA,
Viki
anbu23
April 13, 2007, 4:16am
5
$ cat file
<X>
.....
</X>
<a>
<b>
<c>data1</c>
<c>data2</c>
</b>
<d>
<c>data3</c>
</d>
</a>
<XX>
...
</XX>
$ sed -n "/<a>/,/<\/a>/{/<\/*a>/d;s/^<\([^>]*\)>\([^<]*\)<\/\1>/\2/;s/^<\/.*$/--------------/;s/<\(.*\)>/\1/;p;}" file
b
data1
data2
--------------
d
data3
--------------
I am getting what you have asked.
$ awk -F"[<>]" ' /<a>/,/<\/a>/ {
> if ( $0 !~ /<\/*a>/ ) {
> if ( $0 == "</" tag ">" ) { print str }
> else if ( NF == 3 ) { str = $2 ":" ; tag=$2 }
> else { str = str " " $3 }
> }
> } ' file
b: data1 data2
d: data3