Help with shell script to extract data from XML file

Hello Scripting Gurus,

I need help with extracting data from the XML file using shell script.
The data is in a large XML and I need to extract the id values of all completedworkflows. Here is a sample of it. Input and output data is also in the attached text files.

<wfregistry>
<completedworkflows>
<id v="3381"/>
<id v="3399"/>
<id v="3415"/>
<id v="3431"/>
<id v="3447"/>
<id v="3463"/>
<id v="3479"/>
<id v="3495"/>
<id v="3511"/>
<id v="3527"/>
<id v="3543"/>
<id v="3559"/>
<id v="3575"/>
<id v="3591"/>
<id v="3607"/>
</completedworkflows>
<completedtasks>
<id v="3383"/>
<id v="3389"/>
<id v="3390"/>
<id v="3401"/>
<id v="3407"/>
<id v="3408"/>
<id v="3417"/>
<id v="3423"/>
<id v="3424"/>
<id v="3433"/>
<id v="3439"/>
<id v="3440"/>
<id v="3449"/>
<id v="3455"/>
<id v="3456"/>
<id v="3465"/>
<id v="3471"/>
</completedtasks>
</wfregistry>

The output has to be list of all completed tasks.i.e :

      3381  
      3399  
      3415  
      3431  
      3447  
      3463  
      3479  
      3495  
      3511  
      3527  
      3543  
      3559  
      3575  
      3591  
      3607

Your help is highly appreciated.

Thank you,
Ajay.

You have to print the second field considering the field separator as ( " )

try this....
nawk 'BEGIN{ FS=" " "};/<id v=/{print $2 }' input_file

Im not sure about this " { FS=" " "} " ... make sure to well define the field separator...

Regards

Rather than using grep, sed or awk to transform the XML into the required output, it is better to use an XSLT processor.

If you have (Gnome) libxsl/libxslt installed, it comes with xsltproc a command line interface to a XSLT v1.0 compliant processor.

Here is a stylesheet which performs the required transformation using xsltproc:

$ cat file.xsl
<?xml version="1.0"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">

<xsl:output method="text" omit-xml-declaration="yes" />

<xsl:template match="id">
    <xsl:value-of select="@v"/><xsl:text>
</xsl:text>
</xsl:template>

<xsl:template match="/">
  <xsl:apply-templates select="/wfregistry/completedworkflows/id" />
</xsl:template>

</xsl:stylesheet>

$ xsltproc file.xsl file.xml
3381
3399
3415
3431
3447
3463
3479
3495
3511
3527
3543
3559
3575
3591
3607
$

use this cmd it work quikly:
awk -F "\"" '/id/ {print $2}' test

Another one :

awk '/id v=/ { print }' filename  | sed 's!<id v=\"\(.*\)\"/>!\1!'

Your request is not very clear :confused:

Base on your required output this should work for you.

awk -F'"' '/complet/ &&! f{f=1;next}/complet/ && f{exit} f{print $2}'  file