Extract the specific tags in a XML file

Hello Shell Gurus,

I have a requirement to get the specific tags from a XML file.
Here is my code snippet

<jdbc-system-resource>
    <name>SDPData Source</name>
    <target>AdminServer,osb_server1,soa_server1</target>
    <descriptor-file-name>jdbc/SDPData_Source-1102-jdbc.xml</descriptor-file-name>
  </jdbc-system-resource>
  <jdbc-system-resource>
    <name>94machineDB</name>
    <target>soa_server1</target>
    <descriptor-file-name>jdbc/94machineDB-5930-jdbc.xml</descriptor-file-name>
  </jdbc-system-resource>

Here i have to avoid where <name> contains SDP and rest i have to write into new file.
Eg:-
IN this case, i have to get a new file like below.


  <jdbc-system-resource>
    <name>94machineDB</name>
    <target>soa_server1</target>
    <descriptor-file-name>jdbc/94machineDB-5930-jdbc.xml</descriptor-file-name>
  </jdbc-system-resource>

Any help will be extreemly appreciated.

Hello Siv51427882,

Could you please try following and let me know if this helps you.

awk '/<jdbc-system-resource>/{non_flag=val="";flag=1} /<\/jdbc-system-resource>/ && !non_flag{print val RS $0;val=flag="";next} non_flag{next} flag && /SDP/{non_flag=1;val="";next} flag && NF{val=val?val ORS $0:$0}'  Input_file
 

Output will be as follows.

  <jdbc-system-resource>
    <name>94machineDB</name>
    <target>soa_server1</target>
    <descriptor-file-name>jdbc/94machineDB-5930-jdbc.xml</descriptor-file-name>
  </jdbc-system-resource>

EDIT: Adding a non-one liner form of solution too now.

awk '
/<jdbc-system-resource>/{
  non_flag=val="";
  flag=1
}
/<\/jdbc-system-resource>/ && !non_flag{
  print val RS $0;
  val=flag="";
  next
}
non_flag{
  next
}
flag && /SDP/{
  non_flag=1;
  val="";
  next
}
flag && NF{
  val=val?val ORS $0:$0
}
'   Input_file
 

Thanks,
R. Singh

1 Like

Using python:-

import xml.etree.ElementTree as ET

tree = ET.parse('input.xml')

root = tree.getroot()

for jdbc in root.findall('jdbc-system-resource'):
    name = jdbc.find('name').text
    if ( "SDP" in name ):
        root.remove(jdbc)

tree.write('output.xml')
1 Like

Hello R.Singh&Yoda,

I have another problem here.
Kindly consider the below code,

 <jdbc-system-resource>
    <name>EDNLocalTxDataSource</name>
    <target>soa_server1</target>
    <descriptor-file-name>jdbc/EDNLocalTxDataSource-jdbc.xml</descriptor-file-name>
  </jdbc-system-resource>
  <jdbc-system-resource>
    <name>SOALocalTxDataSource</name>
    <target>soa_server1</target>
    <descriptor-file-name>jdbc/SOALocalTxDataSource-jdbc.xml</descriptor-file-name>
  </jdbc-system-resource>
  <jdbc-system-resource>
    <name>mds-soa</name>
    <target>AdminServer,soa_server1</target>
    <descriptor-file-name>jdbc/mds-soa-jdbc.xml</descriptor-file-name>
  </jdbc-system-resource>
  <jdbc-system-resource>
    <name>aqjmsuserDataSource</name>
    <target>soa_server1</target>
    <descriptor-file-name>jdbc/aqjmsuserDataSource-1696-jdbc.xml</descriptor-file-name>
  </jdbc-system-resource>

IN the Above code, i have to avoid the <jdbc-system-resources> entries for which <name> contains "SOA" or "soa" or "EDN" or "mds" or "SDP" or "sdp"

for the above code, i would like to get an output like below.

  <jdbc-system-resource>
    <name>aqjmsuserDataSource</name>
    <target>soa_server1</target>
    <descriptor-file-name>jdbc/aqjmsuserDataSource-1696-jdbc.xml</descriptor-file-name>
  </jdbc-system-resource>
  <jdbc-system-resource>

BTW, Thanks for you kind help. your suggestions are lot for me.

Thanks,
Siva

Hello Siv51427882,

Could you please try following and let me know if this helps you.

awk '
/<jdbc-system-resource>/{
  non_flag=val="";
  flag=1
}
/<\/jdbc-system-resource>/ && !non_flag{
  print val RS $0;
  val=flag="";
  next
}
non_flag{
  next
}
flag && /<name>/ && (/SDP/ || /sdp/ || /SOA/ || /soa/ || /EDN/ || /mds/){
  non_flag=1;
  val="";
  next
}
flag && NF{
  val=val?val ORS $0:$0
}
'  Input_file

Output will be as follows.

  <jdbc-system-resource>
    <name>aqjmsuserDataSource</name>
    <target>soa_server1</target>
    <descriptor-file-name>jdbc/aqjmsuserDataSource-1696-jdbc.xml</descriptor-file-name>
  </jdbc-system-resource>
 

Thanks,
R. Singh

1 Like

Hello R.Singh,

Thanks for your kind help on this.
now, i have another xml file(DataSourceTest-8855-jdbc.xml) in different location.
i would like to access this file and write the contents into another file.
For Example
Here is the file i am taking about.

<?xml version='1.0' encoding='UTF-8'?>
<jdbc-data-source>
  <name>DataSourceTest</name>
  <jdbc-driver-params>
    <url>jdbc:oracle:thin:@10.137.181.188:1521/soa12cdb</url>
    <driver-name>oracle.jdbc.xa.client.OracleXADataSource</driver-name>
    <properties>
      <property>
        <name>user</name>
        <value>DEV_SOAINFRA</value>
      </property>
    </properties>
    <password-encrypted>{AES}xxxxx/xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx=</password-encrypted>
  </jdbc-driver-params>
  <jdbc-connection-pool-params>
    <initial-capacity>0</initial-capacity>
    <test-table-name>SQL SELECT 1 FROM DUAL</test-table-name>
  </jdbc-connection-pool-params>
  <jdbc-data-source-params>
    <jndi-name>jdbc/SOADBDataSource</jndi-name>
  </jdbc-data-source-params>
</jdbc-data-source>

we would like to get the output file as below format

Parameter1=DataSourceTest
Parameter2=jdbc:oracle:thin:@10.137.181.188:1521/soa12cdb
like parameter3,etc

And sorry for keep bothering you as i am new to shell and trying to explore it.

Thanks
Siva

Hello Siv51427882,

Your request is NOT totally clear, could you please try following and let me know if this helps.

awk -F"[><]" '/<\/jdbc-data-source>/{print "Parameter1="para1 RS "Parameter2="para2;para1=para2=""} /<jdbc-data-source>/{getline;para1=$3;next} /<driver-name>/{para2=$3;next}'  Input_file

Output will be as follows.

Parameter1=DataSourceTest
Parameter2=oracle.jdbc.xa.client.OracleXADataSource

EDIT: Adding a non-one liner form of solution too now.

awk -F"[><]" '
/<\/jdbc-data-source>/{
  print "Parameter1="para1 RS "Parameter2="para2;
  para1=para2=""
}
/<jdbc-data-source>/{
  getline;
  para1=$3;
  next
}
/<driver-name>/{
  para2=$3;
  next
}
'   Input_file
 

Thanks,
R. Singh

Hello R.Singh,

Consider the below code

?xml version='1.0' encoding='UTF-8'?>
<jdbc-data-source >
  <name>DataSourceTest</name>
  <jdbc-driver-params>
    <url>jdbc:oracle:thin:@10.137.181.188:1521/soa12cdb</url>
    <driver-name>oracle.jdbc.xa.client.OracleXADataSource</driver-name>
    <properties>
      <property>
        <name>user</name>
        <value>DEV_SOAINFRA</value>
      </property>
    </properties>
    <password-encrypted>{AES}xxxxx/xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx=</password-encrypted>
  </jdbc-driver-params>
  <jdbc-connection-pool-params>
    <initial-capacity>0</initial-capacity>
    <test-table-name>SQL SELECT 1 FROM DUAL</test-table-name>
  </jdbc-connection-pool-params>
  <jdbc-data-source-params>
    <jndi-name>jdbc/SOADBDataSource</jndi-name>
  </jdbc-data-source-params>
</jdbc-data-source>

we need the values of the above code into a text file in below format
For Ex:-

Param1:DataSourceTest
Param2:jdbc:oracle:thin:@10.137.181.188:1521/soa12cdb
Param3:oracle.jdbc.xa.client.OracleXADataSource
Param4:DEV_SOAINFRA

Like wise, we need to map all of the values of XML file into a text file.

Thanks,
Siva

Hello Siv51427882,

Could you please try following and let me know if this helps you.

awk -F"[><]" '/<\/jdbc-data-source>/{print "Param1:"param1 RS "Param2:"param2 RS "Param3:"param3 RS "Param4:"param4;param1=param2=param3=param4="";next}  /<jdbc-data-source >/{getline;param1=$3;next} /<url>/{param2=$3;next} /<driver-name>/{param3=$3;next} /<value>/{param4=$3;next}'  Input_file

EDIT: Adding a non-one liner form of solution too now.

awk -F"[><]" '
/<\/jdbc-data-source>/{
  print "Param1:"param1 RS "Param2:"param2 RS "Param3:"param3 RS "Param4:"param4;
  param1=param2=param3=param4="";
  next
}
/<jdbc-data-source >/{
  getline;
  param1=$3;
  next
}
/<url>/{
  param2=$3;
  next
}
/<driver-name>/{
  param3=$3;
  next
}
/<value>/{
  param4=$3;
  next
}
'   Input_file

Thanks,
R. Singh

2 Likes

Hello R.Singh,

I have executed your code below.

awk -F"[><]" '
/<\/jdbc-data-source>/{
  print "Param1:"param1 RS "Param2:"param2 RS "Param3:"param3 RS "Param4:"param4;
  param1=param2=param3=param4="";
  next
}
/<jdbc-data-source >/{
  getline;
  param1=$3;
  next
}
/<url>/{
  param2=$3;
  next
}
/<driver-name>/{
  param3=$3;
  next
}
/<value>/{
  param4=$3;
  next
}
'   Input_file

And i got below output.

Param1:
Param2:jdbc:oracle:thin:@10.137.181.188:1521/soa12cdb
Param3:oracle.jdbc.xa.client.OracleXADataSource
Param4:DEV_SOAINFRA

Param1:- Coming as null.

Here is the input file.

<?xml version='1.0' encoding='UTF-8'?>
<jdbc-data-source xmlns="/weblogic/jdbc-data-source" xmlns:sec="/weblogic/security" 
xmlns:wls="/weblogic/security/wls" xmlns:xsi="/XMLSchema-instance" 
xsi:schemaLocation="/weblogic/jdbc-data-source /weblogic/jdbc-data-source/1.2/jdbc-data-source.xsd">

  <name>DataSourceTest</name>
  <jdbc-driver-params>
    <url>jdbc:oracle:thin:@10.137.181.188:1521/soa12cdb</url>
    <driver-name>oracle.jdbc.xa.client.OracleXADataSource</driver-name>
    <properties>
      <property>
        <name>user</name>
        <value>DEV_SOAINFRA</value>
      </property>
    </properties>
    <password-encrypted>{AES}xxxxx/xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx=</password-encrypted>
  </jdbc-driver-params>
  <jdbc-connection-pool-params>
    <initial-capacity>0</initial-capacity>
    <test-table-name>SQL SELECT 1 FROM DUAL</test-table-name>
  </jdbc-connection-pool-params>
  <jdbc-data-source-params>
    <jndi-name>jdbc/SOADBDataSource</jndi-name>
  </jdbc-data-source-params>
</jdbc-data-source>

Kindly suggest on this.

Thanks,
Siva

Yes, and for a reason: it's because the recent input has a blank line whereas the sample given in post#8 has not.

Please be aware that your specification is far from descriptive or clear (as has been pointed out before). Why don't you tell people that once the <jdbc-data-source tag has been encountered, the <name> tag should go to a line or variable called parameter 1, <url> to parameter 2, etc. ?
And,

doesn't help either. Specify WHAT is taken from where and printed to where.

Hello Rudi C,

I have a file like below

<?xml version='1.0' encoding='UTF-8'?>
<jdbc-data-source>
  <name>DataSourceTest</name>
  <jdbc-driver-params>
    <url>jdbc:oracle:thin:@10.137.181.188:1521/soa12cdb</url>
    <driver-name>oracle.jdbc.xa.client.OracleXADataSource</driver-name>
    <properties>
      <property>
        <name>user</name>
        <value>DEV_SOAINFRA</value>
      </property>
    </properties>
    <password-encrypted>{AES}xxxxx/xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx=</password-encrypted>
  </jdbc-driver-params>
  <jdbc-connection-pool-params>
    <initial-capacity>0</initial-capacity>
    <test-table-name>SQL SELECT 1 FROM DUAL</test-table-name>
  </jdbc-connection-pool-params>
  <jdbc-data-source-params>
    <jndi-name>jdbc/SOADBDataSource</jndi-name>
  </jdbc-data-source-params>
</jdbc-data-source>

I have executed the below script, which is provided by R.Singh

awk -F"[><]" '
/<\/jdbc-data-source>/{
  print "Param1:"param1 RS "Param2:"param2 RS "Param3:"param3 RS "Param4:"param4;
  param1=param2=param3=param4="";
  next
}
/<jdbc-data-source >/{
  getline;
  param1=$3;
  next
}
/<url>/{
  param2=$3;
  next
}
/<driver-name>/{
  param3=$3;
  next
}
/<value>/{
  param4=$3;
  next
}
'   Input_file

And the below is the output which i get

Param1:
Param2:jdbc:oracle:thin:@10.137.181.188:1521/soa12cdb
Param3:oracle.jdbc.xa.client.OracleXADataSource
Param4:DEV_SOAINFRA

At first place, This thread is not allowing me post any weblinks such as xmlns.com,etc
And hence, i have removed the contents from <jdbc-data-source> tag.
Coming to my requirement

Param1: DataSourceTest
Param2:jdbc:oracle:thin:@10.137.181.188:1521/soa12cdb
Param3:oracle.jdbc.xa.client.OracleXADataSource
Param4:DEV_SOAINFRA
Param5:{AES}xxxxx/xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx=
Param6:jdbc/SOADBDataSource

Hope it makes clear now.

Thanks
Siva

Sorry, no, it doesn't. Did you read and understand my entire post?

  • Which of the samples is the valid one now - the one with the blank line in post#10, or the other ones in #8 and #12? RavinderSingh13's proposal depends on this. And, I'm pretty sure you don't get the result in #12 with the sample input in #12!
  • Will "Param1" ALWAYS be "DataSourceTest", "Param2" ALWAYS "jdbc:Oracle:thin:@10.137.181.188:1521/soa12cdb" etc.? If not (which I guess is the case), specify WHAT goes WHERE.

DON'T leave people guessing what your final requirements are!

3 Likes

Hello RudiC,

For Some reasons this thread is NOT allowing me to paste the complete code, since the code contains some web links.

Lemme try from HTML Tag

<?xml version='1.0' encoding='UTF-8'?>
<jdbc-data-source xmlns="http://xmlns.oracle.com/weblogic/jdbc-data-source" xmlns:sec="http://xmlns.oracle.com/weblogic/security" xmlns:wls="http://xmlns.oracle.com/weblogic/security/wls" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://xmlns.oracle.com/weblogic/jdbc-data-source http://xmlns.oracle.com/weblogic/jdbc-data-source/1.2/jdbc-data-source.xsd">
  <name>DataSourceTest</name>
  <jdbc-driver-params>
    <url>jdbc:oracle:thin:@10.137.181.188:1521/soa12cdb</url>
    <driver-name>oracle.jdbc.xa.client.OracleXADataSource</driver-name>
    <properties>
      <property>
        <name>user</name>
        <value>DEV_SOAINFRA</value>
      </property>
    </properties>
    <password-encrypted>{AES}xxxxx/xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx=</password-encrypted>
  </jdbc-driver-params>
  <jdbc-connection-pool-params>
    <initial-capacity>0</initial-capacity>
    <test-table-name>SQL SELECT 1 FROM DUAL</test-table-name>
  </jdbc-connection-pool-params>
  <jdbc-data-source-params>
    <jndi-name>jdbc/SOADBDataSource</jndi-name>
  </jdbc-data-source-params>
</jdbc-data-source>

we are expecting the following output.
1\) Param1 Should be mapped to the value of &lt;name&gt;&lt;/name&gt;, which is under &lt;jdbc-data-source&gt; tag.

In the above case Param1 should be "DataSourceTest"

2\) Param2 should be mapped to  the value of &lt;url&gt;&lt;/url&gt;, which is under&lt;jdbc-data-source-params&gt; tag.

In the above case Param2 should be "jdbc:oracle:thin:@10.137.181.188:1521/soa12cdb"

3\) Param3 should be mapped to  the value of &lt;driver-name&gt;&lt;/driver-name&gt;, which is under&lt;jdbc-data-source-params&gt; tag.

In the above case Param3 should be "oracle.jdbc.xa.client.OracleXADataSource"

4\) Param4 should be mapped to  the value of &lt;value&gt;&lt;/value&gt;, which is under&lt;jdbc-data-source-params&gt; tag.

In the above case Param4 should be "DEV_SOAINFRA"

5\) Param5 should be mapped to  the value of &lt;password-encrypted&gt;&lt;/password-encrypted&gt;, which is under&lt;jdbc-data-source-params&gt; tag.

In the above case Param5 should be "\{AES\}xxxxx/xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx="

6\)Param6 should be mapped to  the value of &lt;jndi-name&gt;&lt;/jndi-name&gt;, which is under&lt;jdbc-data-source-params&gt; tag.

In the above case Param6 should be "jdbc/SOADBDataSource"

Hope it makes clear this time.

Thanks,
Siva

Hi, try:

awk '
  $1~/jdbc-data-source / {
    split($1,F,/"/)
    name=F[2]
  } 
  {
    P[$1]=$2
  }
  END {
    print name, P["url"], P["driver-name"], P["value"], P["password-encrypted"], P["jndi-name"]
  } 
' RS=\< FS=\> OFS='\n' file

Output:

http://xmlns.oracle.com/weblogic/jdbc-data-source
jdbc:oracle:thin:@10.137.181.188:1521/soa12cdb
oracle.jdbc.xa.client.OracleXADataSource
DEV_SOAINFRA
{AES}xxxxx/xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx=
jdbc/SOADBDataSource

--
On Solaris use /usr/xpg4/bin/awk

1 Like

PLease consider the input file as below

<?xml version='1.0' encoding='UTF-8'?>
<jdbc-data-source xmlns="http://xmlns.oracle.com/weblogic/jdbc-data-source" xmlns:sec="http://xmlns.oracle.com/weblogic/security" xmlns:wls="http://xmlns.oracle.com/weblogic/security/wls" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://xmlns.oracle.com/weblogic/jdbc-data-source http://xmlns.oracle.com/weblogic/jdbc-data-source/1.2/jdbc-data-source.xsd">
  <name>DataSourceTest</name>
  <jdbc-driver-params>
    <url>jdbc:oracle:thin:@10.137.181.188:1521/soa12cdb</url>
    <driver-name>oracle.jdbc.xa.client.OracleXADataSource</driver-name>
    <properties>
      <property>
        <name>user</name>
        <value>DEV_SOAINFRA</value>
      </property>
    </properties>
    <password-encrypted>{AES}xxxxx/xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx=</password-encrypted>
  </jdbc-driver-params>
  <jdbc-connection-pool-params>
    <initial-capacity>0</initial-capacity>
    <test-table-name>SQL SELECT 1 FROM DUAL</test-table-name>
  </jdbc-connection-pool-params>
  <jdbc-data-source-params>
    <jndi-name>jdbc/SOADBDataSource</jndi-name>
  </jdbc-data-source-params>
</jdbc-data-source>

And the expected output for the above input file should be

Param1:-DataSourceTest <which is nothing but the value of <name>>
Param2:-jdbc:oracle:thin:@10.137.181.188:1521/soa12cdb<which is nothing but the value of <url>>
Param3:-oracle.jdbc.xa.client.OracleXADataSource<which is nothing but the value of <driver-name>>
Param4:-DEV_SOAINFRA<which is nothing but the value of <value>>
Param5:-{AES}xxxxx/xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx=<which is nothing but the value of <password-encrypted>>
Param6:-jdbc/SOADBDataSource<which is nothing but the value of <jndi-name>>

Thanks,
Siva

Hi,

Can you not use Perl for this? It has modules specifically written for this purpose.

Hello Siv51427882,

Could you please try following and let me know if this helps you.

awk -F"[><]" '/jdbc-data-source/{a=1} a && /<name>.*<\/name>/{print "Param1:-"$3;a="";next} /<url>.*<\/url?/{print "Param2:-"$3;next} /<driver-name>.*<\/driver-name>/{print "Param3:-"$3;next} /<value>.*<\/value>/{print "Prarm4:-"$3;next} /<password-encrypted>.*<\/password-encrypted>/{print "Param5:-"$3;next} /<jndi-name>.*<\/jndi-name>/{print "Param5:-"$3}'  Input_file

Output will be as follows.

Param1:-DataSourceTest
Param2:-jdbc:oracle:thin:@10.137.181.188:1521/soa12cdb
Param3:-oracle.jdbc.xa.client.OracleXADataSource
Prarm4:-DEV_SOAINFRA
Param5:-{AES}xxxxx/xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx=
Param5:-jdbc/SOADBDataSource

EDIT: Adding a non-one liner form of solution too now.

awk -F"[><]" '
/jdbc-data-source/{
  a=1
}
a && /<name>.*<\/name>/{
  print "Param1:-"$3;
  a=""
  next
}
/<url>.*<\/url?/{
  print "Param2:-"$3;
  next
}
/<driver-name>.*<\/driver-name>/{
  print "Param3:-"$3
  next
}
/<value>.*<\/value>/{
  print "Prarm4:-"$3
  next
}
/<password-encrypted>.*<\/password-encrypted>/{
  print "Param5:-"$3
  next
}
/<jndi-name>.*<\/jndi-name>/{
  print "Param5:-"$3
  next
}
'   Input_file
 

Thanks,
R. Singh

1 Like

With the example of post#15, do you think you can manage to modify it so that it prints param<n>: before each value?

Hello R.Singh,

That worked..!!!
Thanks a lot.
And sorry for confusion around.

Thanks
Siva