I have written a perl script that will load the entire data file into an array and then I would check the value of the specific column and then if interested I will write to a good file else I will write it to a bad file.
But here, the problem is that if the data file is a huge file then storing in an array would cause a memory utilization issue. So i thought I have to read the data file line by line and then check for the column values.
open(FILE,$file)|| die ("could not open file $file: $!");
my (@whole, @header, @footer, @goodlines, @badlines, @fields);
my $line;
$line = $_;
@whole = <FILE>;
foreach (@whole) {
$line = $_;
@fields = split (/\|/, $line);
if($fields[57] eq " " || $fields[57] eq " ")
{
push @badlines, $line;
}
elsif( ($fields[32] eq "N.A." || $fields[32] eq " ") && ($fields[33] eq "N.A." || $fields[33] eq " ") && ($fields[34] eq "N.A." || $fields[34] eq " ") && ($fields[38] eq "N.A." || $fields[38] eq " ") && ($fields[62] eq "N.A." || $fields[62] eq " "))
{
push @badlines, $line;
}
else
{
push @goodlines, $line;
}
}
open my $fh, ">", $goodfile;
print $fh @header, @goodlines, @footer;
close $fh;
open my $fh1, ">", $badfile;
print $fh1 @badlines;
close $fh1;
printf(" The New Feed file is located at --------------> '%s'\n" , $goodfile);
printf(" The Ignored records are located --------------> '%s'\n\n" , $badfile);
Instead of storing the entire data file into an array (memory) , could someone please advice how can I read the data file line by line so that it doesn't uses much memory.
Really appreciate your thoughts and time. Thanks a lot for looking into this.
The data file contains the header information (example: names of the columns) and the footer contains the number of the data records, filestamp.
I have ran your script ...its avoiding the "out of memory" issue. But while extracting the header information, the good file doesn't include the string "START-OF-FILE" and "START-OF-DATA" ...
In the footer the file stamp could be the as it is, but since the number of records have been changes in the good file...I need to count the number of reords (excluding the header information) and the replace it to with the original i.e.
footer Information:
DATARECORDS=3530288 --> Need to count the number of records in goodfile and put it over here
TIMEFINISHED=Mon Jan 9 19:24:03 EST 2012
END-OF-FILE
If I was using arrays , then I was using the below logic for the above:
my $footer_len = 4;
my $datarec_line = 1;
do {
$line = shift @whole;
push @header, $line;
} while $line !~ /^START-OF-DATA/;
my $n = @goodlines;
$n -= grep {/^# PRODUCT/} @goodlines;
$footer[$datarec_line] =~ s/\d+/$n/;
Could you please advice any similar logic for the header and footer information to be included to the good file .
Thanks a lot for your reply and for all your help.
I need to count the number of line in the good file excluding the header and footer and then would need to substitute the count with the number existing.
Example:
Number of records in good file without header and footer : 1418125
Before:
END-OF-DATA
DATARECORDS=3530288
TIMEFINISHED=Mon Jan 9 19:24:03 EST 2012
END-OF-FILE
After:
END-OF-DATA
DATARECORDS=1418125
TIMEFINISHED=Mon Jan 9 19:24:03 EST 2012
END-OF-FILE
In the future you may want to consider using the Tie::File module which can access the lines of a disk file via a Perl array if you cannot read a file into memory because of its size.