I have a very large XML feed (2.7 MB) which crashes the server at the time of parsing. Now to reduce the load on the server I have a cron job running every 5 min.'s. This job will get the file from the feed host and keep it in the local machine.
This does not solve the problem as the file still gets loaded in the server. The file looks something like this:
regarding the end of line problem, what format is the file currently in i.e. does it have LF, CR/LF or CR as it's end of line marker?
depending on format depends on which tool to use.
to go from dos to unix use dos2unix or run the file up in vim and :set fileformat=unix
There seems to be some problem with the command. The command seems to execute, but when I see the outputfile, it is the complete copy of the xmlfeed.
I don't think there is a problem with the file format, because I do not see ^M in the file.
I think the problem could be with the multiple occurrences of "NewsRelease" in the file.
Also my requirement is that, I need the first 5 occurrences of <NewsRelease> ... </NewsRelease> from the XMLFeed to another file, as I need to Parse the first 5 news releases to HTML using XSL.
I am already using a XSL Style sheet to convert the XML to HTML in Sun Portal Server (Using XML Provider). But I am facing a problem, whenever someone hits the server, it loads the complete XML (this file is around 2.5 MB) and loads the server. There are 4 servers which go down one by one because of the load.
I thought that if I can trim the file to a smaller file at the unix level, this might solve the purpose (I have put a crontab job which gets the file from the XML host server and puts the file in the file system, then I am trying to trim the file in UNIX, and then I will try to parse the output XML using a XSL).
Is there a UNIX level parser to convert XML to HTML using XSL?
If you want I can give you the code of the XSL that I am using to convert XML.
There are a number of free parsers available for UNIX platforms. The most common is probally the one associated with libxslt2 i.e. xsltproc.
BTW, If your input document is that large and causing the problems you describe, I suggest you use a SAX or SiAX processor instead of a DOM-based processor. If you have access to IEEE Computer Society proceedings, there was an article in the Sept 2008 edition of Computer by Lam, Ding, and Liu on XML Document parsing performance characteristics which gives more information and benchmarks.
Thanks Murphy , xsltproc resolved the issue. There were two problems I faced.
One was that there was a xml tag in the beginning of the html output and second was the html and body tags were missing.
For the xml tag I used:
sed '1,1d' input_with_xml_tag.html output_without_xml_tag.html
and I was not much bothered about the html and body tag missing as the Portal takes care of that.