print xml data without the tags.

Hi All,

I'm trying to extract data from an xml file but without the codes. I've achieved it but i was wondering if there's a better way to do this.
sample data:


$ cat xmlfile
<code>
<to>tove</to>
<from>jani</from>
<heading>reminder</heading>
<body>dont forget me</body>
</code>

$ awk -F'>' '{print $2}' xmlfile | cut -d'<' -f1

tove
jani
reminder
dont forget me

If you are using GNU awk, you can do all with one awk sweep.

awk -F'[<>]' '{print $3}' xmlfile

Note that this assumes there is at most one value per line.

awk -F'[<>]' '{print $3,$7,$11}' xmlfile

works for up to 3 per line.
To not print empty lines on lines with just one tag:

<code>

you could test whether $3 is empty:

awk -F'[<>]' '$3{print $3}' xmlfile

Which will also ignore lines with empty value:

<tag></tag>

For any more sophisticated XML parsing, you'll probably want to use perl or some other tool that has xml modules.

1 Like

thanks mirni for the reply. just one question, could you explain the de-limiter used here

 [<>]

. does that by default represent an xml tag ?

awk doesn't know anything about xml.
The [<>] is a character group, it will split on either < or >. The [><] would do just the same.
With GNU awk you can use a regular expression for delimiter.
If it was [0-9], it would split on any digit.

As far as I know, every major AWK implementation treats FS as a regular expression when it consists of more than one character (it's required by POSIX).

Regards,
Alister

1 Like