How to improve the performance of parsers in Perl?

vanitham · April 21, 2011, 2:08am

Hi,

I have around one lakh records. I have used XML for the creation of the data.

I have used these 2 Perl modules.

use XML::DOM; 
use XML::LibXML;

The data will loo like this and most it is textual entries.

<eid>19000</eid>
<einfo>This is the ..........</einfo>
......
.................
.............................

But i am facing performance issues while creating XML chunk.

For Example: 9000 entries it is around 2 min.

Is there any option available to reduce the time taken and improve the performance of these modules and XML generation?

Is there any caching mechanism available for these modules?

How can i improve the performance and reduce the time taken and provide results in quicker way?

Any suggestions?

Regards
Archana

FYI: A lakh or lac (English pronunciation: /l�k/ lak or /lk/ lahk) is a unit in the Indian numbering system equal to one hundred thousand (100,000).

fpmurphy · April 21, 2011, 9:04am

Unless you need the complete document DOM in memory, a SAX (Sequential Access XML) parser will nearly always be faster. Whereas the DOM operates on the document as a whole, SAX parsers operate on each piece of the XML document sequentially. Even better would be StAX (Streaming API for XML) which is a newer API for pull-parsing of XML

Perl has several SAX modules but I do not see any StAX modules.

vanitham · April 26, 2011, 6:01am

Hi,

I tried looking but our constraint is that we need to use DOM parsers only.

So is there caching mechanism available for these parsers?

Is there any method to speed up the Dom parsers in Perl?

Help is very much required.

Regards
Vanitha

fpmurphy · April 27, 2011, 3:22pm

There is really no way to speed up a DOM parser except with a faster machine and more memory. As your file gets bigger and bigger, the DOM representation of your document will require more and more memory. That is why they came up with SAX and StAX.