Hello Scripting Gurus,
I need help with extracting data from the XML file using shell script.
The data is in a large XML and I need to extract the id values of all completedworkflows. Here is a sample of it. Input and output data is also in the attached text files.
<wfregistry>
<completedworkflows>
<id v="3381"/>
<id v="3399"/>
<id v="3415"/>
<id v="3431"/>
<id v="3447"/>
<id v="3463"/>
<id v="3479"/>
<id v="3495"/>
<id v="3511"/>
<id v="3527"/>
<id v="3543"/>
<id v="3559"/>
<id v="3575"/>
<id v="3591"/>
<id v="3607"/>
</completedworkflows>
<completedtasks>
<id v="3383"/>
<id v="3389"/>
<id v="3390"/>
<id v="3401"/>
<id v="3407"/>
<id v="3408"/>
<id v="3417"/>
<id v="3423"/>
<id v="3424"/>
<id v="3433"/>
<id v="3439"/>
<id v="3440"/>
<id v="3449"/>
<id v="3455"/>
<id v="3456"/>
<id v="3465"/>
<id v="3471"/>
</completedtasks>
</wfregistry>
The output has to be list of all completed tasks.i.e :
3381
3399
3415
3431
3447
3463
3479
3495
3511
3527
3543
3559
3575
3591
3607
Your help is highly appreciated.
Thank you,
Ajay.
yahyaaa
September 3, 2008, 6:11pm
2
You have to print the second field considering the field separator as ( " )
try this....
nawk 'BEGIN{ FS=" " "};/<id v=/{print $2 }' input_file
Im not sure about this " { FS=" " "} " ... make sure to well define the field separator...
Regards
Rather than using grep, sed or awk to transform the XML into the required output, it is better to use an XSLT processor.
If you have (Gnome) libxsl/libxslt installed, it comes with xsltproc a command line interface to a XSLT v1.0 compliant processor.
Here is a stylesheet which performs the required transformation using xsltproc:
$ cat file.xsl
<?xml version="1.0"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="text" omit-xml-declaration="yes" />
<xsl:template match="id">
<xsl:value-of select="@v"/><xsl:text>
</xsl:text>
</xsl:template>
<xsl:template match="/">
<xsl:apply-templates select="/wfregistry/completedworkflows/id" />
</xsl:template>
</xsl:stylesheet>
$ xsltproc file.xsl file.xml
3381
3399
3415
3431
3447
3463
3479
3495
3511
3527
3543
3559
3575
3591
3607
$
use this cmd it work quikly:
awk -F "\"" '/id/ {print $2}' test
Another one :
awk '/id v=/ { print }' filename | sed 's!<id v=\"\(.*\)\"/>!\1!'
danmero
September 4, 2008, 12:28am
6
Your request is not very clear
Base on your required output this should work for you.
awk -F'"' '/complet/ &&! f{f=1;next}/complet/ && f{exit} f{print $2}' file