I have a perl script that reads two types of data files (txt and XML). These data files are huge and large in number. I am using something like this :
foreach my $t (@text)
{
open TEXT, $t or die "Cannot open $t for reading: $!\n";
while(my $line=<TEXT>){
....My code....
}
}
close(TEXT);
foreach my $x (@xml)
{
open XML, $x or die "Cannot open $x for reading: $!\n";
while(my $line=<XML>){
....My code....
}
}
When I run it directly like following, it gives me "Out of memory" error:
Usage: perl runXML.pl
Can anyone suggest me as to how I can run this using "qsub" or something? I have these files in a directory structure like this:
You are probably exceeding the limit of virtual memory. You must be keeping an array that grows without bounds.
If the files are really big, like > 2GB, consider asking the sysadmin to add more swap space. I personally believe that showing more of your code would help more than adding swap space. I think you are hogging memory in your code.
Depending on the task, you could always break the script into 3 scripts:
Script 1 (control script):
foreach my $t (@text)
{
system(text_script.pl $t);
}
foreach my $x (@xml)
{
system(xml_script.pl $x);
}
Script 2 (text file processing script):
my $t = $ARGV[0];
open TEXT, $t or die "Cannot open $t for reading: $!\n";
while(my $line=<TEXT>){
....My code....
}
close(TEXT);
Script 2 (text file processing script):
my $x = $ARGV[0];
open XML, $x or die "Cannot open $x for reading: $!\n";
while(my $line=<XML>){
....My code....
}
close(XML);
This *may* be easier to debug and maintain as well.
Of course this approach won't work if you're trying to collect everything from all the files before doing any data processing...
...but surely you're not trying to do that ...?