Getting VALUE from Big XML File -- That's All

We got data that was supposed to be CSV, but was sent in a huge XML file.

I've downloaded xmlstarlet , but I'm darned if I can get it to operate the "sel" feature to look down a path and get any sort of value. I see pieces of what should be paths, but they seem to have extraneous characters, and I don't know how to use the various <...> fields to make s decent query. For example,
I want to get: <es:mixedModeRadio>false</es:mixedModeRadio> from the below small piece of the XML file: How?

xmlstarlet sel "/<configData dnPrefix="Undefined">/<xn:SubNetwork id="ONRM_ROOT_MO_R">/<xn:SubNetwork     id="MyTown">/<xn:MeContext id="LL12345">/<xn:VsDataContainer id="LL12345">"

Is there an easier way? Is there some intermediate step I'm missing?

Here's a very tiny part of a very large file:

<?xml version="1.0" encoding="UTF-8"?>
<bulkCmConfigDataFile xmlns:un="utranNrm.xsd"
    xmlns:es="Edward.15.25.xsd"
    xmlns:xn="genericNrm.xsd" xmlns:gn="geranNrm.xsd" xmlns="configData.xsd">
    <fileHeader fileFormatVersion="32.615 V4.5" vendorName="Edward"/>
    <configData dnPrefix="Undefined">
        <xn:SubNetwork id="ONRM_ROOT_MO_R">
            <xn:SubNetwork id="MyTown">
                <xn:attributes>
                    <xn:userDefinedNetworkType>MY_SERVERS</xn:userDefinedNetworkType>
                    <xn:userLabel>MyTown</xn:userLabel>
                </xn:attributes>
                <xn:MeContext id="LL12345">
                    <xn:VsDataContainer id="LL12345">
                        <xn:attributes>
                            <xn:vsDataType>vsDataMeContext</xn:vsDataType>
                            <xn:vsDataFormatVersion>EdwardSpecificAttributes.15.25</xn:vsDataFormatVersion>
                            <es:vsDataMeContext>
                                <es:userLabel>LL12345</es:userLabel>
                                <es:ipAddress>11.164.0.116</es:ipAddress>
                                <es:neMIMversion>vF.1.107</es:neMIMversion>
                                <es:lostSynchronisation>SYNCHRONISED</es:lostSynchronisation>
                                <es:bcrLastChange>1452424403156</es:bcrLastChange>
                                <es:bctLastChange>1452160614628</es:bctLastChange>
                                <es:multiStandardRbs6k>true</es:multiStandardRbs6k>
                                <es:mixedModeRadio>false</es:mixedModeRadio>
                                <es:mirrorMIBversion>F.1.100.S.1.6</es:mirrorMIBversion>
                                <es:stnNodes></es:stnNodes>
                            </es:vsDataMeContext>
                        </xn:attributes>
                    </xn:VsDataContainer>
                    <xn:ManagedElement id="1">
                        <xn:attributes>
                            <xn:locationName></xn:locationName>
                            <xn:userDefinedState></xn:userDefinedState>
                            <xn:vendorName>Edward</xn:vendorName>
                            <xn:userLabel>LL12345</xn:userLabel>
                            <xn:managedElementType>ERBS</xn:managedElementType>
                            <xn:swVersion>108991/23_R0DX</xn:swVersion>
                            <xn:managedBy>SubNetwork=ONRM_ROOT_MO_R,ManagementNode=ONRM</xn:managedBy>

Not sure I understand your question. Do you need help with xmlstarlet or just need to extract that line?

As a complete newbie to XML, I need help with xmlstarlet. That particular line is just an example of one of the values I need to extract. I don't understand the syntax of how to use xmlstarlet to do that sort of thing. An example would help.

Option for sel (select) is using xpath.

What is Xpath ? May be you might get some basics from URL:
XPath Tutorial

bash-2.03$ xml
XMLStarlet Toolkit: Command line utilities for XML
Usage: xml [<options>] <command> [<cmd-options>]
where <command> is one of:
   ed    (or edit)      - Edit/Update XML document(s)
   sel   (or select)    - Select data or query XML document(s) (XPATH, etc)
   tr    (or transform) - Transform XML document(s) using XSLT
   val   (or validate)  - Validate XML document(s) (well-formed/DTD/XSD/RelaxNG)
   fo    (or format)    - Format XML document(s)
   el    (or elements)  - Display element structure of XML document
   c14n  (or canonic)   - XML canonicalization
   ls    (or list)      - List directory as XML
   esc   (or escape)    - Escape special XML characters
   unesc (or unescape)  - Unescape special XML characters
   pyx   (or xmln)      - Convert XML into PYX format (based on ESIS - ISO 8879)
   p2x   (or depyx)     - Convert PYX into XML
<options> are:
   --version            - show version
   --help               - show help
Wherever file name mentioned in command help it is assumed
that URL can be used instead as well.

Type: xml <command> --help <ENTER> for command help

Most of distro has xmllint. That is my fav.

1 Like

One of my biggest concerns is the size of the file we're looking at -- 3.5Gig. I've been told that xmllib2 has problems with files that are around a few hundred lines. Is XMLSTARLET fairly stable with large files?

I have used xmllint with huge files

xmllint --format hugefile.xml >> hugefile_formatted.xml

to get formatted or use --shell option to get you values using xpath.

Wonderful! Now let me ask one more question: Examples of "XPATH" don't quite correspond to examples I see, so tags like "/first/second/third" from a body of XML that looks like:

<first>
    <second>
        <third>

... in my case is more like:

<?xml version="1.0" encoding="UTF-8"?>
<first xmlns:un="Floo.xsd"
    xmlns:es="MoreFloo.xsd"
    xmlns:xn="MoreEvenFloo.xsd" xmlns:gn="geranNrm.xsd" xmlns="configData.xsd">
 <second fileFormatVersion="32.615 V4.5" vendorName="Edward"/>
    <third dnPrefix="Undefined">
        <xn:SubNetwork id="ONRM_ROOT_MO_R">

So my question is how much of these additional strings are used in the path?

/<first xmlns:un="Floo.xsd"
    xmlns:es="MoreFloo.xsd"
    xmlns:xn="MoreEvenFloo.xsd" xmlns:gn="geranNrm.xsd" xmlns="configData.xsd">/ <second fileFormatVersion="32.615 V4.5" vendorName="Edward"/>/<third dnPrefix="Undefined">/

or... the same minus the "<" and ">" ???

In the meantime, I'll keep looking over the documentation and hope to encounter this.

Thanks!!

Xpath can be used like // till pattern ...

1 Like