Need help in creating a Unix Script to parse xml file

Hi All,

My requirement is create an unix script to parse the xml file and display the values of the Elements/value between the tags on console. Like say, I would like to fetch the value of errorCode from the below xml which is 'U007' and display it. Can we use SED command for this? I have tried using the following command but is not working:

sed -n -e "s/<errorCode>\([a-z]*[0-9]*\)<\/errorCode>/\1/p" /x01/hub/data/incoming/Txn200802251031080012-093624998419.xml
<gDSNError>
  <errorCode>U007</errorCode> 
  <errorDescription>00093624998419|BEST_BUY_LONG_DESCRIPTION|PDQ Import - The value coded against this attribute exceeds the maximum field length. Please amend & resend.</errorDescription> 
  <errorDateTime>2008-02-26T09:04:11.728-00:00</errorDateTime> 
  </gDSNError>

Can anyone please help me in creating the unix script to parse and display the values between the tags. Its damn urgent requirement for me I would be very thankful if anyone can help me on this

Please provide us with the sample input and output

awk '/<errorCode>/ {
 gsub(/<errorCode>|<\/errorCode>/,"")
 print $0
}' file

use a dedicated xml parser for more complex operations

the sample xml input file is as below:

<gDSNError>
<errorCode>U007</errorCode>
<errorDescription>00093624998419|BEST_BUY_LONG_DESCRIPTION|PDQ Import - The value coded against this attribute exceeds the maximum field length. Please amend & resend.</errorDescription>
<errorDateTime>2008-02-26T09:04:11.728-00:00</errorDateTime>
</gDSNError>

I tried getting the values between the tags using the following code and I am able to get it:

#to get Error Description

Error_Desc = grep "<errorDescription>.<.errorDescription>" {#hub_in_dir} | sed -e "s/^.<errorDescription/<errorDescription/" | cut -f2 -d">"| cut -f1 -d"<"
Error_Code = grep "<errorCode>.<.errorCode>" {#hub_in_dir} | sed -e "s/^.<errorCode/<errorCode/" | cut -f2 -d">"| cut -f1 -d"<"

in addition my requirement is to write the values to the file with comma seperated. The output file should be something like specified below all in 1 line:

U007, 00093624998419|BEST_BUY_LONG_DESCRIPTION|PDQ Import - The value coded against this attribute exceeds the maximum field length. Please amend & resend, 2008-02-26T09:04:11.728-00:00

of course you can use a complicated awk script to do this (and handle the cases when the sequence of data is something else than you gave above), but i would suggest you import the xml file in a database (for example, ms access or ms excel) and then run an sql query to extract the data the way you want. i think access can also create normalised tables for you.

In our case as we get very large xml's and xml's being varying every time it may become bit complex to import the data to database and as i only need 2 or 3 tag values from the whole xml i dont find the need to import whole data to database. I can fetch those by using SED or awk Scripts.

please suggest me the way I can write the fetched data to a file in 1 line with comma separated using unix script(ksh or sh).

Input sample xml file:

<gDSNError>
<errorCode>U007</errorCode>
<errorDescription>00093624998419|BEST_BUY_LONG_DESCRIPTION|PDQ Import - The value coded against this attribute exceeds the maximum field length. Please amend & resend.</errorDescription>
<errorDateTime>2008-02-26T09:04:11.728-00:00</errorDateTime>
</gDSNError>

The output should be something like as below:

U007, 00093624998419|BEST_BUY_LONG_DESCRIPTION|PDQ Import - The value coded against this attribute exceeds the maximum field length. Please amend & resend, 2008-02-26T09:04:11.728-00:00

Try this:

awk 'BEGIN{FS="<|>"}
NF==5&&!f{printf("%s",$3);f=1;next}
NF==5&&f{printf(",%s",$3)}
END{print ""}
' file

Regards

Thanks a lot....

I will try with the awk code given but, as a newbee i am not very sure what the code does. If you can explain me in brief that helps me a lot.

A little change in the requirement. The input and output are given as below. I would be gr8 if anyone can spare some time to help me on this. A unix script(.ksh or .sh) to parse the input xml file and generate an output file in the format as given below.

Input File:

<Transaction 1>
<first>a</first>
<second>b</second>
</Transaction1>
<Transaction 2>
<first>c</first>
<second>d</second>
</Transaction2>

The output should be in a new file in 2 seperate lines.

Output:
a,b
c,d

The sample XML presented is not valid XML. XML cannot include a free-standing "&", it has to be written as "&".

With this correction in place the following XSL stylesheet will produce the desired output.

<?xml version="1.0"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="text" />
<xsl:template match="gDSNError">
<xsl:value-of select="errorCode"/> <xsl:value-of select="errorDescription"/> <xsl:value-of select="errorDateTime"/>
</xsl:template>
</xsl:stylesheet>

hi i need one help i have an XML file i amtrying to parse it to extract attributes and their values using shell script
<?xml version="1.0" encoding="UTF-8" ?>

  • <raml xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation="raml21.xsd">
  • <cmData type="actual" scope="all" name="plan_file">
  • <header>
    <log dateTime="2008-07-28T11:21:00" action="created" />
    </header>
    </cmdata>
    </raml>

the output should be
log datetime 2008-07-28T11:21:00
action created

please help me at the earliest

$ sed -n '/<log/p' file | sed -e 's/[<\/>]//g' -e 's/ \(action\)/\
> \1/'
log dateTime="2008-07-28T11:21:00"
action="created"
$