How can we extract specific elements from XML?

renukeswar · January 4, 2017, 6:17am

Hi,

I have a requirement to extract specific element value dynamically from XML message.

Here is the sample message:

<File> 
<List> 
<main> 
<dir>doc/store834/archive</dir> 
<count>5</count> 
</main> 
<main> 
<dir>doc/store834/extract</dir> 
<count>6</count> 
</main> 
<main> 
<dir>doc/store834/normal</dir> 
<count>7</count> 
</main> 
</List> 
</File>

My code is :

sed -n '/^<dir>/p' filename.xml | head -1

above code is working for only first element, if I want to fetch the second element using " head -2 " it returns two elements from top to bottom in a single call, can you please help me with the solution.

Thanks in advance.
Regards
Renukeswar

RudiC · January 4, 2017, 6:39am

For exactly your request and input data, try

grep "^<dir" file | tail -n+2 | head -1
<dir>doc/store834/extract</dir>

This works as the desired data is in one single line only, and it is not an "XML problem". For requests dealing with e.g. several layered structures extending across more than one line, there are better suited tools out there to handle xml structures.

Scrutinizer · January 4, 2017, 7:09am

Try:

awk -v cnt=2 -v elmt=dir '$1==elmt && ++c==cnt{print $2}' RS=\< FS=\> file

Corona688 · January 4, 2017, 10:35am

For more-difficult XML, where tags aren't necessarily one clean set per line, you can try my generic XML script, which with GNU awk you would use like

$ awk -f yanx.awk -e 'TAG == "DIR" { print $2 }' ORS="\n" input.xml

doc/store834/archive
doc/store834/extract
doc/store834/normal

$

On mawk, you would put TAG == "DIR" { print $2 } in a file and do

awk -f yanx.awk -f tag.awk ORS="\n" input.xml