Summing column value - using PERL

I'm new to perl programming. I've a csv file as below.

20100221, abc_1, 200
20100221, abc_4, 350
20100221, opq_3, 200
20100221, abc_5, 220
20100221, xyz_1, 500
20100221, abc_2, 500
20100221, abc_3, 100
20100221, xyz_2, 700
20100221, opq_2, 350
20100221, xyz_3, 100
20100221, opq_1, 230

I'm trying to get output as below, by adding matching values:

Output:

20100221, abc, 1370
20100221, xyz, 1300
20100221, opq, 780

I've started coding as below, but not able to continue due to lack of experience.

use strict;
use warnings;

my $inputfilename  = $ARGV[0]; ### CSV file name. ###

open( my $in_fh, "<", $inputfilename ) or die "Can't open $inputfilename : $!";

while (my $line = <$in_fh>) {
  chomp($line);
  my @columns = split(',', $line, 2);

Could any of you help me to accomplish my requirement?

Thanks and Regards / Lokesha

Try this

use strict;
use warnings;

my $inputfilename  = "file1.txt" ;

open( my $in_fh, "<", $inputfilename ) or die "Can't open $inputfilename : $!";
my @columns  ;
my %store ;
while (my $line = <$in_fh>) {
  chomp($line);
#@columns = split(',', $line, 2);
        $line =~ /^(.*?)\s*,(.*?)\s*,(.*?)\s*$/ ;
        my $no = $1 ;
        my $cut = $2 ;
        my $cnt = $3 ;
        $cut =~ s/^(.*?)_[0-9]*$/$1/;
        $store{$no}{$cut}+=$cnt ;
  }

foreach my $no ( keys%store)
{

foreach my $cut ( keys%{$store{$no}})
{
    print "$no $cut $store{$no}{$cut} \n";
}

}


One using awk:

awk -F, '{ A[$1 substr($2,1,4)]+=$3; }END{ for(i in A) print i,A; }' file

One in perl,

while (<DATA>)
{
@A=split ",|_",$_;
$h{"$A[0],$A[1]"} += $A[3];
}
while (($k,$v)= each(%h)) { print join ",",( $k,$v,"\n"); }

__DATA__
20100221, abc_1, 200
20100221, abc_4, 350
20100221, opq_3, 200
20100221, abc_5, 220
20100221, xyz_1, 500
20100221, abc_2, 500
20100221, abc_3, 100
20100221, xyz_2, 700
20100221, opq_2, 350
20100221, xyz_3, 100
20100221, opq_1, 230

Thanks for the reply to all, and for the provided different solutions.

I tried abubacker's solution and got the below result with warning messages and one unwanted additional line, as below:

Use of uninitialized value in substitution (s///) at GEN_Applications_Duration.pl line 17, <$in_fh> line 12.
Use of uninitialized value in hash element at GEN_Applications_Duration.pl line 18, <$in_fh> line 12.
Use of uninitialized value in hash element at GEN_Applications_Duration.pl line 18, <$in_fh> line 12.
Use of uninitialized value in addition (+) at GEN_Applications_Duration.pl line 18, <$in_fh> line 12.
 opq  0  ## Extra outpu line, which is unwanted ##
20100221  xyz 1300
20100221  opq 780
20100221  abc 1370

How to eliminate warninngs and extra output line. Also, output should be comma seperated as below:

20100221,  xyz, 1300

Could you please helm me again?

I guess that you're file would have some empty lines ,

 
use strict;
use warnings;

my $inputfilename  = "file1.txt" ;

open( my $in_fh, "<", $inputfilename ) or die "Can't open $inputfilename : $!";
my @columns  ;
my %store ;
while (my $line = <$in_fh>) {
  chomp($line);
#@columns = split(',', $line, 2);
  if ( $line !~ /^$/ )   # checks for the non empty line 
  {
        $line =~ /^(.*?)\s*,(.*?)\s*,(.*?)\s*$/ ;
        my $no = $1 ;
        my $cut = $2 ;
        my $cnt = $3 ;
        $cut =~ s/^(.*?)_[0-9]*$/$1/;
        $store{$no}{$cut}+=$cnt ;
  }
  }

foreach my $no ( keys%store)
{

foreach my $cut ( keys%{$store{$no}})
{
    print "$no $cut $store{$no}{$cut} \n";
}

}

So please try this ,
If you're very sure that your file does not have empty line , then make
sure it always has the proper format ex :20100221, abc_3, 100

Wareh wahh!!! I'm thrilled :slight_smile:
Tons of thanks to abubacker, it works exactly as expected. :b:

Added the feature to this code, is it possible to handle multiple input files with different dates as below?

input file name: input_20100221.csv
20100221, abc_1, 200
20100221, abc_4, 300
20100221, opq_3, 200
20100221, abc_5, 200
20100221, xyz_1, 500
20100221, abc_2, 500
20100221, abc_3, 100
20100221, xyz_2, 700
20100221, opq_2, 300
20100221, xyz_3, 100
20100221, opq_1, 200

input file name: input_20100222.csv
20100222, abc_1, 100
20100222, abc_4, 200
20100222, opq_3, 200
20100222, abc_5, 200
20100222, xyz_1, 100
20100222, abc_2, 200
20100222, abc_3, 100
20100222, xyz_2, 200
20100222, opq_2, 800
20100222, xyz_3, 600
20100222, opq_1, 700

input file name: input_20100224.csv
20100224, abc_1, 600
20100224, abc_4, 400
20100224, opq_3, 200
20100224, abc_5, 300
20100224, xyz_1, 300
20100224, abc_2, 200
20100224, abc_3, 700
20100224, xyz_2, 200
20100224, opq_2, 200
20100224, xyz_3, 900
20100224, opq_1, 800

Output should be exactly as below:

Rundate,  abc,   opq, xyz  ### header, only one time
20100221, 1300,  700, 1300
20100222,  800, 1700,  900
20100224, 2200, 1200, 1400

Cheers ~~

What if you read from stdin instead a named file? I mean

cat file1 file2 | perl -e '.......'

perl:

use strict vars;
my %hash;
while(<DATA>){
	chmop;
	if(/([0-9]*),\s*([^_]*)_.*,\s*([0-9]*)$/){
		$hash{$1}->{$2}+=$3;
	}
}
foreach my $key (keys %hash){
	my %tmp = %{$hash{$key}};
	foreach my $inkey(keys %tmp){
		print $key,", ",$inkey,", ",$tmp{$inkey},"\n";
	}
}
__DATA__
20100221, abc_1, 200
20100221, abc_4, 350
20100221, opq_3, 200
20100221, abc_5, 220
20100221, xyz_1, 500
20100221, abc_2, 500
20100221, abc_3, 100
20100221, xyz_2, 700
20100221, opq_2, 350
20100221, xyz_3, 100
20100221, opq_1, 230

Thanks for your solutions to 'summer cherry' and 'kcoder24'.

I've already implemented "abubacker's" solution. Now, Added the feature to his code, is it possible to handle multiple input files with different dates as below?

input file name: input_20100221.csv
20100221, abc_1, 200
20100221, abc_4, 300
20100221, opq_3, 200
20100221, abc_5, 200
20100221, xyz_1, 500
20100221, abc_2, 500
20100221, abc_3, 100
20100221, xyz_2, 700
20100221, opq_2, 300
20100221, xyz_3, 100
20100221, opq_1, 200

input file name: input_20100222.csv
20100222, abc_1, 100
20100222, abc_4, 200
20100222, opq_3, 200
20100222, abc_5, 200
20100222, xyz_1, 100
20100222, abc_2, 200
20100222, abc_3, 100
20100222, xyz_2, 200
20100222, opq_2, 800
20100222, xyz_3, 600
20100222, opq_1, 700

input file name: input_20100224.csv
20100224, abc_1, 600
20100224, abc_4, 400
20100224, opq_3, 200
20100224, abc_5, 300
20100224, xyz_1, 300
20100224, abc_2, 200
20100224, abc_3, 700
20100224, xyz_2, 200
20100224, opq_2, 200
20100224, xyz_3, 900
20100224, opq_1, 800

Output should be exactly as below:

Rundate,  abc,   opq, xyz  ### header, only one time
20100221, 1300,  700, 1300
20100222,  800, 1700,  900
20100224, 2200, 1200, 1400

Any suggestion would be great help for me.