Need help in creating a Unix Script to parse xml file

Anil.Wmg · April 10, 2008, 5:57am

Hi All,

My requirement is create an unix script to parse the xml file and display the values of the Elements/value between the tags on console. Like say, I would like to fetch the value of errorCode from the below xml which is 'U007' and display it. Can we use SED command for this? I have tried using the following command but is not working:

sed -n -e "s/<errorCode>\([a-z]*[0-9]*\)<\/errorCode>/\1/p" /x01/hub/data/incoming/Txn200802251031080012-093624998419.xml

<gDSNError>
  <errorCode>U007</errorCode> 
  <errorDescription>00093624998419|BEST_BUY_LONG_DESCRIPTION|PDQ Import - The value coded against this attribute exceeds the maximum field length. Please amend & resend.</errorDescription> 
  <errorDateTime>2008-02-26T09:04:11.728-00:00</errorDateTime> 
  </gDSNError>

Can anyone please help me in creating the unix script to parse and display the values between the tags. Its damn urgent requirement for me I would be very thankful if anyone can help me on this

aajan · April 10, 2008, 8:34am

Please provide us with the sample input and output

ghostdog74 · April 10, 2008, 8:42am

awk '/<errorCode>/ {
 gsub(/<errorCode>|<\/errorCode>/,"")
 print $0
}' file

use a dedicated xml parser for more complex operations

Anil.Wmg · April 10, 2008, 9:43am

the sample xml input file is as below:

<gDSNError>
<errorCode>U007</errorCode>
<errorDescription>00093624998419|BEST_BUY_LONG_DESCRIPTION|PDQ Import - The value coded against this attribute exceeds the maximum field length. Please amend & resend.</errorDescription>
<errorDateTime>2008-02-26T09:04:11.728-00:00</errorDateTime>
</gDSNError>

I tried getting the values between the tags using the following code and I am able to get it:

#to get Error Description

Error_Desc = grep "<errorDescription>.<.errorDescription>" {#hub_in_dir} | sed -e "s/^.<errorDescription/<errorDescription/" | cut -f2 -d">"| cut -f1 -d"<"
Error_Code = grep "<errorCode>.<.errorCode>" {#hub_in_dir} | sed -e "s/^.<errorCode/<errorCode/" | cut -f2 -d">"| cut -f1 -d"<"

in addition my requirement is to write the values to the file with comma seperated. The output file should be something like specified below all in 1 line:

U007, 00093624998419|BEST_BUY_LONG_DESCRIPTION|PDQ Import - The value coded against this attribute exceeds the maximum field length. Please amend & resend, 2008-02-26T09:04:11.728-00:00

ag79 · April 10, 2008, 10:18am

of course you can use a complicated awk script to do this (and handle the cases when the sequence of data is something else than you gave above), but i would suggest you import the xml file in a database (for example, ms access or ms excel) and then run an sql query to extract the data the way you want. i think access can also create normalised tables for you.

Anil.Wmg · April 10, 2008, 12:45pm

In our case as we get very large xml's and xml's being varying every time it may become bit complex to import the data to database and as i only need 2 or 3 tag values from the whole xml i dont find the need to import whole data to database. I can fetch those by using SED or awk Scripts.

please suggest me the way I can write the fetched data to a file in 1 line with comma separated using unix script(ksh or sh).

Input sample xml file:

<gDSNError>
<errorCode>U007</errorCode>
<errorDescription>00093624998419|BEST_BUY_LONG_DESCRIPTION|PDQ Import - The value coded against this attribute exceeds the maximum field length. Please amend & resend.</errorDescription>
<errorDateTime>2008-02-26T09:04:11.728-00:00</errorDateTime>
</gDSNError>

The output should be something like as below:

U007, 00093624998419|BEST_BUY_LONG_DESCRIPTION|PDQ Import - The value coded against this attribute exceeds the maximum field length. Please amend & resend, 2008-02-26T09:04:11.728-00:00

Franklin52 · April 10, 2008, 1:24pm

Try this:

awk 'BEGIN{FS="<|>"}
NF==5&&!f{printf("%s",$3);f=1;next}
NF==5&&f{printf(",%s",$3)}
END{print ""}
' file

Regards

Anil.Wmg · April 11, 2008, 5:46am

Thanks a lot....

I will try with the awk code given but, as a newbee i am not very sure what the code does. If you can explain me in brief that helps me a lot.

A little change in the requirement. The input and output are given as below. I would be gr8 if anyone can spare some time to help me on this. A unix script(.ksh or .sh) to parse the input xml file and generate an output file in the format as given below.

Input File:

The output should be in a new file in 2 seperate lines.

Output:
a,b
c,d

fpmurphy · April 11, 2008, 6:11am

The sample XML presented is not valid XML. XML cannot include a free-standing "&", it has to be written as "&".

With this correction in place the following XSL stylesheet will produce the desired output.

<?xml version="1.0"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="text" />
<xsl:template match="gDSNError">
<xsl:value-of select="errorCode"/> <xsl:value-of select="errorDescription"/> <xsl:value-of select="errorDateTime"/>
</xsl:template>
</xsl:stylesheet>

divyashree · July 29, 2008, 1:41am

hi i need one help i have an XML file i amtrying to parse it to extract attributes and their values using shell script
<?xml version="1.0" encoding="UTF-8" ?>

<raml xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation="raml21.xsd">
<cmData type="actual" scope="all" name="plan_file">
<header>
<log dateTime="2008-07-28T11:21:00" action="created" />
</header>
</cmdata>
</raml>

the output should be
log datetime 2008-07-28T11:21:00
action created

please help me at the earliest

fpmurphy · July 29, 2008, 6:58am

$ sed -n '/<log/p' file | sed -e 's/[<\/>]//g' -e 's/ \(action\)/\
> \1/'
log dateTime="2008-07-28T11:21:00"
action="created"
$