Reading XML data in a FLAT FILE

I have a requirement to read the xml file and split the files into two diffrent files in Unix shell script. Could anyone please help me out with this requirement.

Sample file
---------------

0,<?xml version="1.0" encoding="UTF-8" standalone="yes"?> 
<Information xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"> 
<s>
<Name>aaa</Name>
<age>12</age>
</s>
</Information>,<?xml version="1.0" encoding="UTF-8" standalone="yes"?><Information xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"><s>
<Name>aaa</Name><age>12</age></s></Information>
1,<?xml version="1.0" encoding="UTF-8" standalone="yes"?> 
<Information xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"> 
<s>
<Name>bbb</Name>
<age>12</age>
</s>
</Information>,<?xml version="1.0" encoding="UTF-8" standalone="yes"?><Information xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"><s>
<Name>bbb</Name><age>12</age></s></Information>

---------------

Expected output:
output1.xml
---------------

<?xml version="1.0" encoding="UTF-8" standalone="yes"?> 
<Information xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"> 
<s>
<Name>aaa</Name>
<age>12</age>
</s>
</Information>

---------------

Output2.xml
---------------

<?xml version="1.0" encoding="UTF-8" standalone="yes"?><Information xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"><s>
<Name>aaa</Name><age>12</age></s></Information>

---------------

After processing the file output1.xml and output2.xml the files need to be purged.

The second line of the sample file has to be read and written to the output1.xml and output2.xml file that is created during this process

Expected output:
output1.xml
---------------

<?xml version="1.0" encoding="UTF-8" standalone="yes"?> 
<Information xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"> 
<s>
<Name>bbb</Name>
<age>12</age>
</s>
</Information>

---------------

Output2.xml
---------------

<?xml version="1.0" encoding="UTF-8" standalone="yes"?><Information xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"><s>
<Name>bbb</Name><age>12</age></s></Information>

---------------

The sample file has to be processed for the whole file in looping process.

Could anyone help me to resolve this?

Thanks in Advance

Krishnakanth Manivannan

How about this, assumes each xml file begins with "<?xml"

awk '/^[0-9][0-9]*\,<\?xml/ {
    FNUM++
    gsub(/^[0-9]*\,/, "")
    print $0 > "output" FNUM ".xml"
    next
}
/.*,<\?xml/ {
   FIRST=$0
   gsub(/,<\?xml.*/, "",FIRST)
   print FIRST >> "output" FNUM ".xml"
   FNUM++
   gsub(/.*,<\?xml/, "<?xml")
   print $0 > "output" FNUM ".xml"
   next
}
FNUM { print $0 >> "output" FNUM ".xml" }' sample
1 Like

Thanks for your reply. It works fine.

One more thing is i need to perform in loop.

sample.txt
----
0,<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<Information xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<s>
<Name>aaa</Name>
<age>12</age>
</s>
</Information>,<?xml version="1.0" encoding="UTF-8" standalone="yes"?><Information xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"><s>
<Name>aaa</Name><age>12</age></s></Information>
1,<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<Information xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<s>
<Name>bbb</Name>
<age>12</age>
</s>
</Information>,<?xml version="1.0" encoding="UTF-8" standalone="yes"?><Information xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"><s>
<Name>bbb</Name><age>12</age></s></Information>
----

For example in this case, output1.xml will be having
----
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<Information xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<s>
<Name>aaa</Name>
<age>12</age>
</s>
</Information>
----

and output2.xml will be having
----
<?xml version="1.0" encoding="UTF-8" standalone="yes"?><Information xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"><s>
<Name>aaa</Name><age>12</age></s></Information>
----

Once the output1.xml and output2.xml is generated, the script will invoke the datastage job for further processing.

Once the datastage job completes its execution, the control has to comes back to unix and it has to read the next set to generate/overwrite same output1.xml and output2.xml and it should have the following content.

next set
----
1,<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<Information xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<s>
<Name>bbb</Name>
<age>12</age>
</s>
</Information>,<?xml version="1.0" encoding="UTF-8" standalone="yes"?><Information xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"><s>
<Name>bbb</Name><age>12</age></s></Information>
----

output1.xml
----
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<Information xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<s>
<Name>bbb</Name>
<age>12</age>
</s>
</Information>
----

output2.xml should have
----
<?xml version="1.0" encoding="UTF-8" standalone="yes"?><Information xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"><s>
<Name>bbb</Name><age>12</age></s></Information>
----

The following code snippet which you have given works fine.
----
awk '/^[0-9][0-9]\,<\?xml/ {
FNUM++
gsub(/^[0-9]
\,/, "")
print $0 > "output" FNUM ".xml"
next
}
/.,<\?xml/ {
FIRST=$0
gsub(/,<\?xml.
/, "",FIRST)
print FIRST >> "output" FNUM ".xml"
FNUM++
gsub(/.*,<\?xml/, "<?xml")
print $0 > "output" FNUM ".xml"
next
}
FNUM { print $0 >> "output" FNUM ".xml" }' sample
----

But I need to perform this in loop till the end of the file sample.txt.

Could you please help me out?

Thanks
Krishnakanth Manivannan

Thank You Chubler_XL. The logic which you have given works.

Thanks for your help!!

Krishnakanth Manivannan