Parse XML file

Hi,

I need to parse the following XML data enclosed in <a> </a> XML tag using shell script.

<X>
.....
</X>
<a>
<b>
<c>data1</c>
<c>data2</c>
</b>
<d>
<c>data3</c>
</d>
</a>

<XX>
...
</XX>

Further I need to display the data in the following format

b
data1
data2
-----
d
data3

Could any body suggest a way to extract the data residing <a> </a> XML tags.

TIA,
Viki

Viki,
See if this would solve your problem:
sed -e 's!</.>!!' -e 's!<.>!!' xml_file

sed -n "/<a>/,/<\/a>/{/<\/*a>/d;s/^<\([^>]*\)>\([^<]*\)<\/\1>/\2/;s/^<\/.*$/--------------/;s/<\(.*\)>/\1/;p;}" file

Hi anbu23,

Thanks for quick reply :slight_smile: . I am getting the following output with the suggested 'sed command'.

> sed -n "/<a>/,/<\/a>/{/<\/*a>/d;s/^<\([^>]*\)>\([^<]*\)<\/\1>/\2/;s/^<\/.*$/--------------/;s/<\(.*\)>/\1/;p;}" c.xml
b
c>data1</c
c>data2</c
/b
d
c>data3</c
/d

where c.xml contain the following data.

> cat c.xml
<X>
.....
</X>
<a>
<b>
<c>data1</c>
<c>data2</c>
</b>
<d>
<c>data3</c>
</d>
</a>

<XX>
...
</XX>

The issue is to extract the XML tags i.e. "b" and "d" and then read the XML tag <c>.
Further store the data in a text file in the following format

b:data1 data2
d:data3

Could you please help me out.

TIA,
Viki

$ cat file
<X>
.....
</X>
<a>
<b>
<c>data1</c>
<c>data2</c>
</b>
<d>
<c>data3</c>
</d>
</a>

<XX>
...
</XX>
$ sed -n "/<a>/,/<\/a>/{/<\/*a>/d;s/^<\([^>]*\)>\([^<]*\)<\/\1>/\2/;s/^<\/.*$/--------------/;s/<\(.*\)>/\1/;p;}" file
b
data1
data2
--------------
d
data3
--------------

I am getting what you have asked.

$ awk -F"[<>]" ' /<a>/,/<\/a>/ {
> if ( $0 !~ /<\/*a>/ ) {
>       if ( $0 == "</" tag ">" ) { print str }
>       else if ( NF == 3 ) { str = $2 ":" ; tag=$2 }
>       else { str = str " " $3 }
> }
> } ' file
b: data1 data2
d: data3

Hi anbu23,

it works.... :slight_smile:

Thanks,
Viki