Perl report problem...

ganapati · July 2, 2008, 11:18am

Hi All,

As my old group friends knows, I know shell scripting but very new to perl scripting. Hence struggling now for the simple task. This should be done using perl. Any help for the below requirement would be greatful for me...

I've around 40 files, with the below layout:
file1.csv
C01;35047
C02;0
C03;0
C04;0
C05;1294
C06;0
C07;0
C08;0

file2.csv
C01;197874
C02;20
C03;9
C04;0
C05;0
C06;0
C07;406
C08;0

and so on up to 40 files.

I've to read all the 40 files and for the non-zero second column I need the count and total values in to a summary file 'summary_report.csv'.

Example Final Report should be as below:

C01;35047
C02;20001
C03;3002
C04;32224
C05;1294
C06;23
C07;8474
C08;737

Till now I've build the code as below:
my $report_dir = "$prereq_dir/REPORTS";

opendir DIR, $report_dir || die "Cannot opendir $report_dir $!";
while ( $filename = readdir(DIR) )
{
open(PAGE, $filename) || die "I can't open $filename";

Iam struggling to find the second field, count and totalling the values.
Any light on my dark way would be much appreciated...

radoulov · July 2, 2008, 11:35am

I'm trying to decode the output: where is the count?
Or you just need to sum the values of the second field?
What should be the output from the sample input files above?

Trying to guess:

perl -F';' -lane'
  $_{$F[0]} += $F[1];
  END { 
    $, = ";";
	for (sort keys %_) {
      print $_, $_{$_}
  }
}' "$prereq_dir"/REPORTS/*.csv

Or:

perl -F';' -lane'
  $_{$F[0]} += $F[1];
  END {
    print map {"$_;$_{$_}\n"} sort keys %_ 
	}'  "$prereq_dir"/REPORTS/*.csv

If you want to exclude 0 values:

perl -F';' -lane'
  $_{$F[0]} += $F[1];
  END { 
    $, = ";";
	for (sort keys %_) {
      print $_, $_{$_} if $_{$_}     
  }
}'  "$prereq_dir"/REPORTS/*.csv

ganapati · July 2, 2008, 12:05pm

Sorry for leading you towards misunderstanding...
Please consider the below 3 sample files:

file1.csv
C01;10
C02;0
C03;0
C04;0
C05;20
C06;0
C07;0
C08;0

file2.csv
C01;50
C02;20
C03;90
C04;0
C05;0
C06;0
C07;40
C08;0

file3.csv
C01;0
C02;0
C03;0
C04;10
C05;30
C06;80
C07;40
C08;60

Then the resultant output file should be as follows:

output.csv
C01;60
C02;20
C03;90
C04;10
C05;50
C06;80
C07;80
C08;60

One more thing is that, I am writing a perl script and not a shell script. hence I cannot use the line "perl -F';' -lane'" (If Iam not wrong).
Please guide me...

radoulov · July 2, 2008, 12:11pm

So:

zsh-4.3.4% head *csv
==> file1.csv <==
C01;10
C02;0
C03;0
C04;0
C05;20
C06;0
C07;0
C08;0

==> file2.csv <==
C01;50
C02;20
C03;90
C04;0
C05;0
C06;0
C07;40
C08;0

==> file3.csv <==
C01;0
C02;0
C03;0
C04;10
C05;30
C06;80
C07;40
C08;60

zsh-4.3.4% perl -F';' -lane'
  $_{$F[0]} += $F[1];
  END {
    print map {"$_;$_{$_}\n"} sort keys %_
}' *csv
C01;60
C02;20
C03;90
C04;10
C05;50
C06;80
C07;80
C08;60

?

The command above is a perl script, not a shell script ...

Perhaps you prefer something like this:

zsh-4.3.4% cat perl_script
#!/usr/bin/perl -lanF;

$_{$F[0]} += $F[1];

END {
      print map {"$_;$_{$_}\n"} sort keys %_;
}
zsh-4.3.4% ./perl_script *csv
C01;60
C02;20
C03;90
C04;10
C05;50
C06;80
C07;80
C08;60

ganapati · July 2, 2008, 12:21pm

Thanks for your answer radoulov
you always helped me. Still sorry for this questions again...

In the perl script I'm using "#!/usr/bin/perl" in the first line. Is it required to write perl in your first line perl -F';' -lane' ?

As you know I'm not an advanced perl programmer, could you please explain your code? i.e, what is the meaning of ' -lane' and where is the output file name etc...

Sorry, eventhough I'm wasting your valuable time, it would be very helpful for me..

Thanks again / Mysore Ganapati.

ghostdog74 · July 2, 2008, 2:13pm

another way

while (<>) {
    ($f1,$f2) = split(/[;]/, $_, -1);
    $a{$f1} += $f2;
}
foreach $i (keys %a) {
    print $i,":",$a{$i},"\n";
}

output:

# ./test.pl file1 file2 file3
C04:10
C03:90
C05:50
C02:20
C07:80
C06:80
C01:60
C08:60

ganapati · July 2, 2008, 2:35pm

Thanks a lot ghostdog74,

But the problem is I'm not passing the files name in command line. I'm assigning these to variables inside the perl script. Same is for the output file.

So, could you please suggest me for this...

radoulov · July 2, 2008, 4:47pm

No, if you need to add this logic to an existing code
and your program invokes the perl interpreter without any switches
you should leave it as it is and use something like this:

#!/usr/bin/perl

# you should always use these in a script
use warnings;
use strict;

# your existing code here

{
    my $path = defined $ENV{'prereq_dir'}?$ENV{'prereq_dir'}.'/REPORTS/file':'./file';
    my ($ext, $outfile, %data, @flds) = ('.csv', 'output.csv');
    while (<$path*$ext>) {
        open IN, $_ or warn "Error openning $_: $!\n" and next;
        while (<IN>) {
             @flds[1,2] = split ';';
             $data{$flds[1]} += $flds[2];
        }
        close IN;

    }

    open OUT, '>', $outfile;
    print OUT map {"$_;$data{$_}\n"} sort keys %data;

}

# other code if any

I'm not either and I'm sure there is a better way to write the above

See perldoc perlrun

ganapati · July 3, 2008, 3:04am

Radoulov,

You are shining with your prompt answers and suggestions.
Thanks again for your valuable time and help.

Cheers~;)~ / Mysore Ganapati...

ganapati · July 3, 2008, 6:19am

Hi Radoulov,

I've restructured my code with your solution as below:

#!/usr/bin/perl

# you should always use these in a script
use warnings;
use strict;

# your existing code here

my $report_dir = "/export/home/PREREQUISITS/20080430/REPORTS";

opendir DIR, $report_dir or die "Cannot opendir $report_dir $!";
while ( my $filename = readdir(DIR) ) 
{
open( PAGE, $filename ) or die "I can't open $filename";

    my $path = defined $report_dir?$report_dir.'/REPORTS/file':'./file';
    my ($ext, $outfile, %data, @flds) = ('.csv', 'output.csv');
    while (<$path*$ext>) 
    {
        open IN, $_ or warn "Error openning $_: $!\n" and next;
        while (<IN>) 
        {
             @flds[1,2] = split ';';
             $data{$flds[1]} += $flds[2];
        }
        close IN;

    }

    open OUT, '>', $outfile;
    print OUT map {"$_;$data{$_}\n"} sort keys %data;

}

But this is throwing one warning while compiling, which I'm not being able to solve. Code should be error/warning free... Could you please have a look on this?

$ perl -c ganesh.pl
Name "main::PAGE" used only once: possible typo at ganesh.pl line 14.
ganesh.pl syntax OK

radoulov · July 3, 2008, 6:34am

Try changing it like this:

#!/usr/bin/perl

# you should always use these in a script
use warnings;
use strict;

my $report_dir = '/export/home/PREREQUISITS/20080430/REPORTS/';
my ($ext, $outfile, %data, @flds) = ('.csv', 'output.csv');
while (<$report_dir*$ext>) {
  open IN, $_ or warn "Error openning $_: $!\n" and next;
  while (<IN>) {
    @flds[1,2] = split ';';
    $data{$flds[1]} += $flds[2];
    }
  close IN;
}

open OUT, '>', $report_dir.$outfile or die "Error creating output file: $!\n";
print OUT map {"$_;$data{$_}\n"} sort keys %data;

ganapati · July 3, 2008, 7:14am

It worked..

Thanks a lot radoulov:)

Shreedhar_Naik · July 3, 2008, 9:14am

Hi,

Are you able to read the contents of the files?

Sorry you already got the solution right? No problem i wait for your next query

ganapati · July 4, 2008, 4:02am

Hi Radoulov,

A small problem with the code. If I run the below program one time, it is working perfectly. But if I run this program more than 2-3 times, the hash and array contents are keep on adding, eventhough I've initialized in the begining. Which are resulting big numbers in the output file.

Pls have a look and help me...

#!/usr/bin/perl

# you should always use these in a script
use warnings;
use strict;

my %data;
my @flds=();
my $report_dir = '/export/home/L86898/MYDATA/PREREQUISITS/20080430/REPORTS/';
my ($ext, $outfile) = ('.csv', 'Summary_by_TestScript.csv');
while (<$report_dir*$ext>) {
  open IN, $_ or warn "Error openning $_: $!\n" and next;
  while (<IN>) {
    @flds[1,2] = split ';';
    $data{$flds[1]} += $flds[2];
    }
  close IN;
}

open OUT, '>', $report_dir.$outfile or die "Error creating output file: $!\n";
print OUT map {"$_;$data{$_}\n"} sort keys %data;

radoulov · July 4, 2008, 4:21am

Yes, the script adds the output file :).

Add this line:

next if $_ eq $report_dir.$outfile;

Like this:

#!/usr/bin/perl

# you should always use these in a script
use warnings;
use strict;

my %data;
my @flds=();
my $report_dir = '/export/home/L86898/MYDATA/PREREQUISITS/20080430/REPORTS/';
my ($ext, $outfile) = ('.csv', 'Summary_by_TestScript.csv');
while (<$report_dir*$ext>) {
  next if $_ eq $report_dir.$outfile;
  open IN, $_ or warn "Error openning $_: $!\n" and next;
  while (<IN>) {
    @flds[1,2] = split ';';
    $data{$flds[1]} += $flds[2];
    }
  close IN;
}

open OUT, '>', $report_dir.$outfile or die "Error creating output file: $!\n";
print OUT map {"$_;$data{$_}\n"} sort keys %data;

ganapati · July 4, 2008, 5:17am

Great, thats why I trust you...

Thanks again (pls dont get bored for these thanks) Radoulov.

With Best Regards / Mysore Ganapati

radoulov · July 4, 2008, 5:57am

I'm glad to be of help and to exercise my newly acquired Perl skills

ganapati · July 7, 2008, 7:59am

Hi Radoulov,

Sorry for my stupid question. But I failed understand the answers from the perl study material regarding "map" function.

Could you please explain me what exactly the below code of line is doing using map() ? Are there any other ways to do the same thing without using map() ?
One of my friend has asked this question, but I's failed to answer him also failed to understand from perl book...

print OUT map {"$_;$data{$_}\n"} sort keys %data;

Sorry for the pain again and again...

radoulov · July 7, 2008, 8:41am

I'm not able to give better explanation than this one,
it answers both questions (also check ghostdog74's code above):

(perldoc -fmap):