SED extract XML value

I have the following string:

<min-pool-size>2</min-pool-size>

When I pipe the string into the following code I am expcting for it to return just the value "2", but its just reurning the whole string. Why??

sed -n '/<min-pool-size>/,/<\/min-pool-size>/p'

Outputting: <min-pool-size>2</min-pool-size>

echo "<min-pool-size>2</min-pool-size>" | sed 's/\(.*\)\([0-9]\)\(.*\)/\2/

Panyam, that works but I plan to replace "min-pool-size" with a variable and I don't think your method would work for that.

So do you want to:
(a) extract the number 2 (data between tags) ? or
(b) replace "min-pool-size" with a variable ?

The poster's solution was correct for the problem you posed. It's a tad difficult for us to peer into your mind especially if your requirements change.

tyler_durden

I want to extract the data between the tags, but the tags will be input as a variable.

The reason is simple: you should use some substitution, which "substitutes" the tag and the surrounding brackets to nothing, leaving only the text between the tags - "2".

What you have written is a so-called "range" command: you told sed to do something - the command "p" in your case - to a range of lines which are starting with "<min-pool-size>" and ending with "</min-pool-size>". So everything in the one line constituting the range is being printed.

Regarding youe problem: lets reformulate your problem in a more general way: You have text in the form

<tag>some text</tag>

where "tag" is some supplied text from outside. you want to filter out everything save for "some text". First, ask yourself if the text you want to preserve could span more than one line, because there will be extra effort necessary if this could be the case. Is the following possible?

<tag>some
more text</tag>

And if it is do you want to preserve line breaks as in the original or do you want some one-line stream to be the result?

Let us start with the simplest: only one-liners. The problem here, like in all the other cases, is to put the inherited variable content into the sed-regexp. Save the following to a file called "test1.sh", give it execute-rights and call it like "test1.sh /path/to/inputfile":

typeset fIn="$1"
typeset tag="min-pool-size"
sed 's/<'"${tag}"'>\(.*\)<\/'"$tag"'>/\1/' $fIn

This will work in your example, but it will fail in my second variant. Notice that i quoted cautiously all the variables to be sure to end up with a continuous string as regexp after the shell is done expanding the variables.

In the next version we will take care of multiline contents but will let them remain multiline. We have three sorts of lines to deal with:

1) "some content<tag>content-to-preserve"
2) "lines between start- and end-tag"
3) "content-to-preserve</tag>some more content"

We will use the range facility, like you did, but in a somewhat more complicated fashion:

typeset fIn="$1"
typeset tag="min-pool-size"
sed -n '/<'"${tag}"'>/,/<\/'"$tag"'>/ {
               s/^.*<'"$tag"'>//
               s/<\/'"$tag"'>.*$//
               p
          }' $fIn

With the "-n" all output of sed is suppressed. This way we filter out all irrelevant text before and after our tags. In the first line of the regexp we declare the range - the first line where the start-tag appears, the last line where the end-tag appears and all lines in between. To these lines we apply all the commands in the curly braces one by one. Then we cut out everything before the start-tag including it itself in the first line (taking care of the type-1 lines above), everything after the end-tag (taking care of the type-3-lines above) and the type-2-lines are untouched. The last command, "p", prints everything inside the range left over from the cuttings - voila!

How to transform this output to a one-line stream is left as an exercize to the interested reader who by now should be eager to try his newly found insight in the workings of sed on a problem of his own. ;-))

I hope this helps.

bakunin

Bakunin, thank you so much for the explanation you gave. Fortunately in this instance everything is on one line, so your first sed function will work. Your second sed function will definitely come in handy and could be the bases for very versatile XML parsing function. Thank you very much for your contributions.

Ok, new problem, ive got the following text file:

    <JDBCDataSource JNDIName="jdbc/rDB"
        Name="" PoolName="Othername"   Targets="myserver"/>
    <JDBCConnectionPool CapacityIncrement="2"
        DriverName="net.sourceforge.jtds.jdbc.Driver"
        InitialCapacity="4" MaxCapacity="20" Name="Othername"
        PasswordEncrypted="" Properties="user="
        Targets="myserver" TestConnectionsOnRelease="false" URL=""/>
    <JDBCDataSource JNDIName="jdbc/FirstDataBankDB" Name="fdbDataSource"
        PoolName="Othernamel" Targets="myserver"/>

<JDBCConnectionPool CapacityIncrement="5"
        DriverName="oracle.jdbc.driver.OracleDriver" InitialCapacity="5"
        MaxCapacity="35" Name="Othername"
        PasswordEncrypted=""
        PreparedStatementCacheSize="25"
        Properties=;dll=ocijdbc8;protocol=thin"
        RefreshMinutes="10" Targets="myserver"
        TestConnectionsOnRelease="false" TestConnectionsOnReserve="true"
        TestTableName="dual" URL=""/>

    <JDBCDataSource JNDIName="" 
        Name="ThisName"
        PoolName="l" RowPrefetchEnabled="true"
        RowPrefetchSize="100" Targets="myserver"
        CapacityIncrement="2"
        DriverName="net.sourceforge.jtds.jdbc.Driver"
        InitialCapacity="4" MaxCapacity="20" Name=""
        PasswordEncrypted="" Properties="user"
        Targets="myserver" TestConnectionsOnRelease="false" URL=""/>

    <JDBCDataSource JNDIName="jdbc/rDB"
        Name="" PoolName="Othername"   Targets="myserver"/>
    <JDBCConnectionPool CapacityIncrement="2"
        DriverName="net.sourceforge.jtds.jdbc.Driver"
        InitialCapacity="4" MaxCapacity="20" Name="Othername"
        PasswordEncrypted="" Properties="user="
        Targets="myserver" TestConnectionsOnRelease="false" URL=""/>
    <JDBCDataSource JNDIName="jdbc/FirstDataBankDB" Name="fdbDataSource"
        PoolName="Othernamel" Targets="myserver"/>
</Domain>

I need a SED function that isolates everything form "<JDBCDataSource" to "/>" where the Name parameter is "Name=ThisName". Ive tried modifying the range function given above, but I don't know how to make SED ignore the newline characters within a regular expression. The order of the entries is not guaranteed to be as listed and there can be many <JDBCDataSource entries.

---------- Post updated at 01:54 PM ---------- Previous update was at 01:24 PM ----------

I just thought of something. If I can get all of the entries on their own single line I could just grep for the "ThisName" parameter. Can anyone come up with a SED function that places everything in between "<" and "/>" on its own single line?

---------- Post updated at 02:57 PM ---------- Previous update was at 01:54 PM ----------

Or extract everything between the first occurrence of "<JDBCDataSource" before "Name=ThisName" and the first occurrence of "/>" after "Name=ThisName".

Try this:

tr '\n' '_' < file | tr '>' '\n'|sed '/Name="ThisName"/d'| tr '\n' '>'| tr '_' '\n'

But more simply with awk:

awk '/ThisName/{next}1' ORS="\n\n" RS=  FS="\n" file

Regards

Franklin52,

Neither solution seems to work, the AWK solution actually returns nothing.

The solutions work fine for me. This is what I get:

$ cat file
    <JDBCDataSource JNDIName="jdbc/rDB"
        Name="" PoolName="Othername"   Targets="myserver"/>
    <JDBCConnectionPool CapacityIncrement="2"
        DriverName="net.sourceforge.jtds.jdbc.Driver"
        InitialCapacity="4" MaxCapacity="20" Name="Othername"
        PasswordEncrypted="" Properties="user="
        Targets="myserver" TestConnectionsOnRelease="false" URL=""/>
    <JDBCDataSource JNDIName="jdbc/FirstDataBankDB" Name="fdbDataSource"
        PoolName="Othernamel" Targets="myserver"/>

<JDBCConnectionPool CapacityIncrement="5"
        DriverName="oracle.jdbc.driver.OracleDriver" InitialCapacity="5"
        MaxCapacity="35" Name="Othername"
        PasswordEncrypted=""
        PreparedStatementCacheSize="25"
        Properties=;dll=ocijdbc8;protocol=thin"
        RefreshMinutes="10" Targets="myserver"
        TestConnectionsOnRelease="false" TestConnectionsOnReserve="true"
        TestTableName="dual" URL=""/>

    <JDBCDataSource JNDIName="" 
        Name="ThisName"
        PoolName="l" RowPrefetchEnabled="true"
        RowPrefetchSize="100" Targets="myserver"
        CapacityIncrement="2"
        DriverName="net.sourceforge.jtds.jdbc.Driver"
        InitialCapacity="4" MaxCapacity="20" Name=""
        PasswordEncrypted="" Properties="user"
        Targets="myserver" TestConnectionsOnRelease="false" URL=""/>

    <JDBCDataSource JNDIName="jdbc/rDB"
        Name="" PoolName="Othername"   Targets="myserver"/>
    <JDBCConnectionPool CapacityIncrement="2"
        DriverName="net.sourceforge.jtds.jdbc.Driver"
        InitialCapacity="4" MaxCapacity="20" Name="Othername"
        PasswordEncrypted="" Properties="user="
        Targets="myserver" TestConnectionsOnRelease="false" URL=""/>
    <JDBCDataSource JNDIName="jdbc/FirstDataBankDB" Name="fdbDataSource"
        PoolName="Othernamel" Targets="myserver"/>
</Domain>
$
$ tr '\n' '_' < file | tr '>' '\n'|sed '/Name="ThisName"/d'| tr '\n' '>'| tr '_' '\n'
    <JDBCDataSource JNDIName="jdbc/rDB"
        Name="" PoolName="Othername"   Targets="myserver"/>
    <JDBCConnectionPool CapacityIncrement="2"
        DriverName="net.sourceforge.jtds.jdbc.Driver"
        InitialCapacity="4" MaxCapacity="20" Name="Othername"
        PasswordEncrypted="" Properties="user="
        Targets="myserver" TestConnectionsOnRelease="false" URL=""/>
    <JDBCDataSource JNDIName="jdbc/FirstDataBankDB" Name="fdbDataSource"
        PoolName="Othernamel" Targets="myserver"/>

<JDBCConnectionPool CapacityIncrement="5"
        DriverName="oracle.jdbc.driver.OracleDriver" InitialCapacity="5"
        MaxCapacity="35" Name="Othername"
        PasswordEncrypted=""
        PreparedStatementCacheSize="25"
        Properties=;dll=ocijdbc8;protocol=thin"
        RefreshMinutes="10" Targets="myserver"
        TestConnectionsOnRelease="false" TestConnectionsOnReserve="true"
        TestTableName="dual" URL=""/>

    <JDBCDataSource JNDIName="jdbc/rDB"
        Name="" PoolName="Othername"   Targets="myserver"/>
    <JDBCConnectionPool CapacityIncrement="2"
        DriverName="net.sourceforge.jtds.jdbc.Driver"
        InitialCapacity="4" MaxCapacity="20" Name="Othername"
        PasswordEncrypted="" Properties="user="
        Targets="myserver" TestConnectionsOnRelease="false" URL=""/>
    <JDBCDataSource JNDIName="jdbc/FirstDataBankDB" Name="fdbDataSource"
        PoolName="Othernamel" Targets="myserver"/>
$
$
$ awk '/ThisName/{next}1' ORS="\n\n" RS=  FS="\n" file
    <JDBCDataSource JNDIName="jdbc/rDB"
        Name="" PoolName="Othername"   Targets="myserver"/>
    <JDBCConnectionPool CapacityIncrement="2"
        DriverName="net.sourceforge.jtds.jdbc.Driver"
        InitialCapacity="4" MaxCapacity="20" Name="Othername"
        PasswordEncrypted="" Properties="user="
        Targets="myserver" TestConnectionsOnRelease="false" URL=""/>
    <JDBCDataSource JNDIName="jdbc/FirstDataBankDB" Name="fdbDataSource"
        PoolName="Othernamel" Targets="myserver"/>

<JDBCConnectionPool CapacityIncrement="5"
        DriverName="oracle.jdbc.driver.OracleDriver" InitialCapacity="5"
        MaxCapacity="35" Name="Othername"
        PasswordEncrypted=""
        PreparedStatementCacheSize="25"
        Properties=;dll=ocijdbc8;protocol=thin"
        RefreshMinutes="10" Targets="myserver"
        TestConnectionsOnRelease="false" TestConnectionsOnReserve="true"
        TestTableName="dual" URL=""/>

    <JDBCDataSource JNDIName="jdbc/rDB"
        Name="" PoolName="Othername"   Targets="myserver"/>
    <JDBCConnectionPool CapacityIncrement="2"
        DriverName="net.sourceforge.jtds.jdbc.Driver"
        InitialCapacity="4" MaxCapacity="20" Name="Othername"
        PasswordEncrypted="" Properties="user="
        Targets="myserver" TestConnectionsOnRelease="false" URL=""/>
    <JDBCDataSource JNDIName="jdbc/FirstDataBankDB" Name="fdbDataSource"
        PoolName="Othernamel" Targets="myserver"/>
</Domain>
$
$

Am I missing something?

I think your actually getting the opposite of what I am trying to get. I came up with this solution that seems to work:

cat file | awk '{printf("%s", $0); if ( $0 ~ /.*\/>/ ) {printf("\n");}}' | grep ThisName | tr -d '\n' 

Sorry, I misread the question but this should get the part you have colored:

awk '/ThisName/' ORS="\n\n" RS=  FS="\n" file

Regards

ArterialTool, you really should consider using an XSL aware tool such as xsltproc for this type of work.

For example here is how you can use xsltproc to solve your original question

$ var=`xsltproc -param which "'min-pool-size'" file.xsl file.xml`
$ echo $var
2
$

where file.xml is your input file and file.xsl is

<?xml version="1.0"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">

    <!-- pass in as -param which "'value'"  -->
   <xsl;param name=which" />

    <xsl:output method="text'/>

   <xsl:template match="/">
      <xsl:apply-templates/>
   </xsl:template>

   <!-- output nodes that match -->
   <xsl:template match="*" priority="1">
      <xsl:if test="name(.)=$which">
        <xsl:value-of select="."/>
      </xsl:if>
   </xsl:template>

   <!-- eat all other output -->
   <xsl:template match="*|text()" priority="0"/>

</xsl:stylesheet>