I need the above XML file broken down to multiple data files. For eg) Portfolio tag elements should go into Portfolio.txt The text file should be as below
P1|Access|Active
Similarly Family Tag elements should go into Family.txt, same with Sub Family and Products as well.
Can you suggest a way to achieve my scenario. I dont have much experience with shell scripting so excuse if my question is naive.
Shell scripting and XML processing are not a happy combination. XML's not line-based, for one thing, quite the opposite. It's not really 'raw text' either; it's encoded in a way with a million and one permutations that must all be abided by to guarantee proper behavior. There's huge C libraries just for properly handling XML. It'd be possible to hack a shell script to handle just this one arrangement of it but if they suddenly change where the line breaks are, it'll stop working...
I totally understand that. The arrangement is they will not change xml as of now. I just need a way to do the conversion of xml. Pls let me know if possible
---------- Post updated at 11:56 AM ---------- Previous update was at 11:54 AM ----------
I typed the following command
type xml2-config
I got a reply as below
xml2-config is /usr/bin/xml2-config
Does it mean I have xslt installed and i can use it for transformation.
Hi Tyler Thanks a lot for your help. Its working fine and creating the text files
I have a question though, I am trying to create ProductRef.txt and Products.txt just like you did for other three but its not working as expected. Is it because there are some extra tags within those tags.
How do I create files for them as well.
Thanks
Raghav
---------- Post updated at 02:41 PM ---------- Previous update was at 02:37 PM ----------
Tyler,
This is how I modified your code
perl -lne 'BEGIN {undef $/}
while (/<(Portfolio|Family|SubFamily) productCode="(.*?)".*?value="(.*?)".*?
value="(.*?)".*?<\/(Portfolio|Family|SubFamily)>/sg) {
if ($1 eq "Portfolio") {push @p, "$2|$3|$4"}
elsif ($1 eq "Family") {push @f, "$2|$3|$4"}
elsif ($1 eq "SubFamily") {push @sf, "$2|$3|$4"}
elsif ($1 eq "ProductRefs") {push @pr, "$2|$3|$4|$4|$5|$6|$7|$8|$9|$10"}
}
END {if (@p) {open(F, ">portfolio.txt"); foreach(@p) {print F $_} close(F)}
if (@f) {open(F, ">family.txt"); foreach(@f) {print F $_} close(F)}
if (@sf) {open(F, ">subfamily.txt"); foreach(@sf) {print F $_} close(F)}
if (@pr) {open(F, ">ProductRefs.txt"); foreach(@pr) {print F $_} close(F)}
}
' CPC.xml
---------- Post updated at 03:57 PM ---------- Previous update was at 02:41 PM ----------
Hi... I have XSLT installed in my unix box. Can you guys suggest me how to create an xsl file for the xml I posted. I think using the xsl approach will be much easier to convert the xml to a data file
It's not working because the regular expression is incorrect for "ProductRefs". It's correct for "Portfolio", "Family" and "SubFamily" though.
perl -lne 'BEGIN {undef $/}
while (/<(Portfolio|Family|SubFamily) productCode="(.*?)".*?value="(.*?)".*?
value="(.*?)".*?<\/(Portfolio|Family|SubFamily)>/sg) {
if ($1 eq "Portfolio") {push @p, "$2|$3|$4"}
elsif ($1 eq "Family") {push @f, "$2|$3|$4"}
elsif ($1 eq "SubFamily") {push @sf, "$2|$3|$4"}
elsif ($1 eq "ProductRefs") {push @pr, "$2|$3|$4|$4|$5|$6|$7|$8|$9|$10"}
}
END {if (@p) {open(F, ">portfolio.txt"); foreach(@p) {print F $_} close(F)}
if (@f) {open(F, ">family.txt"); foreach(@f) {print F $_} close(F)}
if (@sf) {open(F, ">subfamily.txt"); foreach(@sf) {print F $_} close(F)}
if (@pr) {open(F, ">ProductRefs.txt"); foreach(@pr) {print F $_} close(F)}
}
' CPC.xml
What are the values of the text in red font ? - $1, $5, $6, ..., $10 ?
If you are unable to answer this question then I'd assume that you are not familiar with regular expressions, and in that case, I'd recommend you to get your concepts clear by studying and practising them.
Alternatively, you may want to check out Perl modules related to XML on CPAN.
Here is a partial XSLT1.* stylesheet which outputs the Portfolio and Family attribute values into separate text files. You can easily extend it to handle the remaining attribute values that you want to extract.
Note - Remove the spaces between the "& # 10 ;" in your stylesheet. I had to put the spaces in here as the forum code tags eat up certain XSLT constructs.