Splitting the XML file into three different files

Hello Shell Guru's

I have a requirement to split the source xml file into three different text file.
And i need your valuable suggestion to finish this.

Here is my source xml snippet, here i am using only one entry of <jms-system-resource>. There may be multiple entries in the source file.

  <jms-system-resource>
    <name>UMSJMSSystemResource</name>
    <target>soa_server1,bam_server1</target>
    <sub-deployment>
      <name>UMSJMSServer522129776</name>
      <target>UMSJMSServer_auto_1</target>
    </sub-deployment>
    <sub-deployment>
      <name>UMSJMSServer1709690790</name>
      <target>UMSJMSServer_auto_2</target>
    </sub-deployment>
    <descriptor-file-name>jms/UMSJMSSystemResource-jms.xml</descriptor-file-name>
  </jms-system-resource>

we would like to have three separate files like below for the above file
1) DSFilenames.txt
2) JMSModule.txt
3) Subdeployment.txt

1) DSFilenames.txt should contains only the value of <descriptor-file-name> .
In the above example, the expected output should be like below

  dsfilename=jms/UMSJMSSystemResource-jms.xml

2) JMSModule.txt should contains values of <name> , <target> under <jms-system-resource>
In the above example, the expected output should be like below

  name=UMSJMSSystemResource
  target=soa_server1,bam_server1

3) Subdeployment.txt should contains values of <name> , <target> under each <sub-deployment> tag
In the above example, the expected output should be like below

   name= UMSJMSServer522129776
   target=UMSJMSServer_auto_1
   name=UMSJMSServer1709690790
   target=UMSJMSServer_auto_2

Any help on this will be deeply appreciated.

You may use simple shell techniques (reading file line by line, grep, sed/cut, output redirection) to achieve this; or go for bit more advanced techniques provided by, let's say python's xml modules.

By the way, what have you tried?

And, in addition to what balajesuri has already said, please also tell us what operating system and shell you're using.

Hello Shell Gurus,

I am able to get the first two files using 'awk'.
But i need some help on the extraction of the third file
Consider the following input

<jms-system-resource>
    <name>UMSJMSSystemResource</name>
    <target>soa_server1,bam_server1</target>
    <sub-deployment>
      <name>UMSJMSServer522129776</name>
      <target>UMSJMSServer_auto_1</target>
    </sub-deployment>
    <sub-deployment>
      <name>UMSJMSServer1709690790</name>
      <target>UMSJMSServer_auto_2</target>
    </sub-deployment>
    <descriptor-file-name>jms/UMSJMSSystemResource-jms.xml</descriptor-file-name>
  </jms-system-resource>

In above code i need to extract the following parameters into a separate file.

For above code, i would like to have the following output

  JMSModuleName=UMSJMSSystemResource
  sub-deploymentname=UMSJMSServer522129776
  targetserver=UMSJMSServer_auto_1
  sub-deploymentname=UMSJMSServer1709690790
  targetserver=UMSJMSServer_auto_2

i have tried the following script to get the same

awk -F"[><]" '
/<jms-system-resource>/{
  a=1
}
a && /<name>/{
  print "JMSModuleName ="$3
  next
}
a && /<target>/{
    next
}
a && /<sub-deployment>/{
  b=1
  }
 b && /<name>/ {
  print "SubdeploymentName ="$3
  next
}
b && /<target>/{
 print "TargetServersName ="$3
 a=""
 b=""
 next
 }
' InputFile

But i got the following output

JMSModuleName=UMSJMSSystemResource
JMSModuleName=UMSJMSServer522129776
JMSModuleName=UMSJMSServer1709690790

Any help on this will be greatly helpful.

Now, this is different from what you requested in post#1, and it is not too clear to me what is actually requested. Given the .xml- file has EXACTLY the structure shown, how far would this get you?

awk '
/<jms-system-resource>/,\
/<\/jms-system-resource>/       {if (/<.?jms-system-resource>/)         {FN = "JMSModule.txt"
                                                                         next
                                                                        }
                                 if (/<.?sub-deployment>/)              {FN = "Subdeployment.txt"
                                                                         next
                                                                        }
                                 if (/<.?descriptor-file-name>/)         FN = "DSFilenames.txt"

                                 sub (/^ *</, _)
                                 sub (/>/, "=")
                                 sub (/<.*$/, _)
                                 if (FN) print  >  FN
                                }
' file
cf *.txt

---------- DSFilenames.txt: ----------

descriptor-file-name=jms/UMSJMSSystemResource-jms.xml

---------- JMSModule.txt: ----------

name=UMSJMSSystemResource
target=soa_server1,bam_server1

---------- Subdeployment.txt: ----------

name=UMSJMSServer522129776
target=UMSJMSServer_auto_1
name=UMSJMSServer1709690790
target=UMSJMSServer_auto_2

cf (cat files) shows the resulting files...

1 Like

Thanks RudiC For your valuable input.
Let me try your suggestion first