UNIX Shell script to work with .xml file

Hi Team,

Could you please help me on below query:
I want to retrieve XML elements from one .xml file. This .xml file has commented tags as well. so i am planning to write Unix command/script which 1.will chekc for this .xml file
2. it will ignore the commented XML lines. i.e. XML tags between
<!-- and -->
3. and it will check for uncommented tags and will retrieve the values as below: and i need to check for the uncommented tags individually and there are duplicate tags with different values.
(I am providing belwo my XML content as below)
Please help to advise.

Required output:
----------------

port="8080" 
protocol="HTTP/1.1"
connectionTimeout="300000"
redirectPort="8443" 
maxThreads="200" 
minSpareThreads="4"        
maxSpareThreads="50" 
maxKeepAliveRequests="1" 

port="8009" 
protocol="AJP/1.3" 
redirectPort="8443"

name="jdbc/CLH" 
auth="Container" 
type="javax.sql.DataSource"
driverClassName="net.sourceforge.jtds.jdbc.Driver"
initialSize="10" 
maxActive="100" 
maxIdle="20" 
maxWait="10000"
validationQuery="select 1"
testOnBorrow="true"
factory="org.apache.commons.dbcp.BasicDataSourceFactory"
url="jdbc:jtds:sqlserver://xx.xxx.xx.xx:xxxx/Chase"
username="xxxxx" 
password="xxxxxx"

---------------
XML file has content as below:

<!-- SingleSignOn valve, share authentication between web applications
            Documentation at: /docs/config/valve.html -->
     <!--
        <Valve className="org.apache.catalina.authenticator.SingleSignOn" />
        -->
     <!-- Access log processes all example.
             Documentation at: /docs/config/valve.html
             Note: The pattern used is equivalent to using pattern="common" -->
    
     <Connector port="8080" protocol="HTTP/1.1"
               connectionTimeout="300000"
               redirectPort="8443" maxThreads="200" minSpareThreads="4"        maxSpareThreads="50" maxKeepAliveRequests="1" />

    <Connector port="8009" protocol="AJP/1.3" redirectPort="8443" />

    <Resource name="jdbc/CLH" auth="Container" type="javax.sql.DataSource"
              driverClassName="net.sourceforge.jtds.jdbc.Driver"
              initialSize="10" maxActive="100" maxIdle="20" maxWait="10000"
              validationQuery="select 1"
              testOnBorrow="true"
              factory="org.apache.commons.dbcp.BasicDataSourceFactory"
              url="jdbc:jtds:sqlserver://xx.xxx.xx.xx:xxxx/Chase"
              username="xxxxx" password="xxxxxx"  />

    <Resource name="UserDatabase" auth="Container"
              type="org.apache.catalina.UserDatabase"
              description="User database that can be updated and saved"
              factory="org.apache.catalina.users.MemoryUserDatabaseFactory"
              pathname="conf/tomcat-users.xml" />

<Resource name="jdbc/MINT" auth="Container" type="javax.sql.DataSource"
               maxActive="3" maxIdle="1" maxWait="10000"
               username="XXX" password="XXX" driverClassName="oracle.jdbc.OracleDriver"
               url="jdbc:oracle:thin:@XXXX:XXX:XXX"
               factory="org.apache.commons.dbcp.BasicDataSourceFactory"/>

-----------------------------
Please could you help to advise.

Thanks..

Welcome to the forum.
You can try something like ..

awk '{for (i=1;i<=NF;i++) {if ($i ~ /=/) {print $i}}}' file

While there are many ways to extract fields from an xml file. The best way for a long term solution, I would prefer to use a good xml parser tool.
Thanks.

EDIT : As pilnet101 pointed out, this wont work for the tags with spaces. I didn't observe in hurry that they also contain spaces.

The above post has a few bugs (i.e. Description variable will not show).

This is messy and probably can be condensed a lot, but it was fun to create :D. Hope it helps:

awk 'BEGIN{ORS=" "}1' xml|sed -r 's/<!--[^-]+-->//g'|awk '{gsub(/\/>/,"\n",$0)}1'|awk '{$1=""}1'|sed -r 's/\w+="[^"]+"/&uniQue/g'|awk 'BEGIN{RS="uniQue"}1'|sed 's/^ //g'
1 Like

HI clx and pilnet101,

Thank you for your quick response. These commands are helpful...
I will try to modify as i need the output.

Thanks..