Perl script to find particular field and sum it

Learnerabc · March 16, 2010, 2:45am

Hi,
I have a file with format

a b c d e
1 1 2 2 2
1 2 2 2 3
1 1 1 1 2
1 1 1 1 4
1 1 1 1 6

in column e i want to find all similar fields ( with perl script )and sum it how many are there
for instance in format above.

2 - 2 times
4 - 1 time
6 - 1 time

what i use is

@a=<STDIN>;
foreach $i (@a){
   @element= $i;
   if (@element[4]=~/2/){
    $abc+=@element[4];
    
   }

this is not working

Allso my other question is, here in example that i gave i have only 2 or 3 different numbers so i can repeat this above if statement and get the result. but what if we need to find this from 5000 different words?

Thnaks

murugaperumal · March 16, 2010, 3:01am

 

 use strict;
use warnings;
use Data::Dumper;
open FH, "<new" or die "Can't Open $!";
my @array1;
my %hash;
<FH>;
while(<FH>)
{
  my @array=split(' ',$_);
  push(@array1,$array[4]);
}
foreach(@array1)
{
  $hash{$_}++;
}

foreach my $key (keys%hash) {
       print " $key => $hash{$key} times\n";
   }

thillai_selvan · March 16, 2010, 3:03am

Use the following code

open FH, "<inp" or die "Can't open file : $!\n";
my @data=<FH>;
my @result;
my @count;
foreach (@data )
{
    push(@result,split);
}
for ( my $i=4; $i <= $#result; $i=$i+5 )
{
    push(@count,$result[$i]);
}

my %count_hash;
shift @count; #to remove the e from the array
foreach my $word (@count)
{
       $count_hash{$word}=$count_hash{$word}+1;
}

foreach my $word (keys %count_hash)
{
            print $word," Comes for ",$count_hash{$word}," times\n";
}

The output I am getting is follows

6 Comes for 1 times
4 Comes for 1 times
3 Comes for 1 times
2 Comes for 2 times

kiruthika_sri · March 16, 2010, 3:11am

try the following code:

                open FH,"sum" or die $!; //Open the file 'sum'.
                my %sum;
                my @lines;
                my $key;
                my $val;
                while(<FH>)//Read lines from the opened file.
                {
                      @lines=split(' ',$_);
                      $sum{$lines[4]}=$sum{$lines[4]}+1;//Forming Hash

                }
                while(($key,$val)=each(%sum))
                {
                    print "$key - $val times \n";//Printing the keys and values in hash
                }

Here 'sum' is a file which contains the following input data.

1 1 2 2 2
1 2 2 2 3
1 1 1 1 2
1 1 1 1 4
1 1 1 1 6

dennis.jacob · March 16, 2010, 3:16am

Try:

perl -lane '$A{(split //)[-1]}++; END {while (($k,$v) = each(%A)) { print "$k $v times" if(int($k)); } }' file

Learnerabc · March 16, 2010, 4:36am

thillai_selvan:

Use the following code

open FH, "<inp" or die "Can't open file : $!\n";
my @data=<FH>;
my @result;
my @count;
foreach (@data )
{
   push(@result,split);
}
for ( my $i=4; $i <= $#result; $i=$i+5 )
{
   push(@count,$result[$i]);
}

my %count_hash;
shift @count; #to remove the e from the array
foreach my $word (@count)
{
   $count_hash{$word}=$count_hash{$word}+1;
}

foreach my $word (keys %count_hash)
{
   print $word," Comes for ",$count_hash{$word}," times\n";
}

The output I am getting is follows

6 Comes for 1 times
4 Comes for 1 times
3 Comes for 1 times
2 Comes for 2 times

How you call this..This script is giving error

thillai_selvan · March 16, 2010, 4:40am

What error you are getting?

Learnerabc · March 16, 2010, 4:42am

it is keep on saying cant open: no such file when i call script on my file

but both files are there and both have execute permission

thillai_selvan · March 16, 2010, 4:44am

use strict;
use warnings;

open FH, "<inp" or die "Can't open file : $!\n";
my @data=<FH>;
my @result;
my @count;
my $word;
foreach (@data )
{
        push(@result,split);
}
for ( my $i=4; $i <= $#result; $i=$i+5 )
{
        push(@count,$result[$i]);
}

my %count_hash;
shift @count;
foreach $word (@count)
{
        unless (defined($count_hash{$word}) )
        {
                $count_hash{$word} = 0;

        }
       $count_hash{$word}=$count_hash{$word}+1;
}

foreach my $word (keys %count_hash)
{
                print $word," Comes for ",$count_hash{$word}," times\n";
}

Use the above code.
Here the file inp is input file.
This input file is having your input data.
Thats why you are getting error. May be your input file name is different.
So correct it.

Learnerabc · March 16, 2010, 4:55am

thillai_selvan:

use strict;
use warnings;

open FH, "<inp" or die "Can't open file : $!\n";
my @data=<FH>;
my @result;
my @count;
my $word;
foreach (@data )
{
   push(@result,split);
}
for ( my $i=4; $i <= $#result; $i=$i+5 )
{
   push(@count,$result[$i]);
}

my %count_hash;
shift @count;
foreach $word (@count)
{
   unless (defined($count_hash{$word}) )
   {
   $count_hash{$word} = 0;

   }
   $count_hash{$word}=$count_hash{$word}+1;
}

foreach my $word (keys %count_hash)
{
   print $word," Comes for ",$count_hash{$word}," times\n";
}

Use the above code.
Here the file inp is input file.
This input file is having your input data.
Thats why you are getting error. May be your input file name is different.
So correct it.

Yes thanks, i was using wrong format .pl instead of .txt.

Now one more question arise, here we knew that we r doint 4th column but how will you do if you have many colums and you want the last one. like in awk i thin we just put $ sign at the end. here in this script what we will chang to get the last field, no matter which column number it is

Thanks for teaching

thillai_selvan · March 16, 2010, 4:58am

Simple!!! . $#array will have the last index of the array. So using this we can easily get the last column number.
Example:

print $#data; #this will give 5.

So we can easily identify that totally there are 5 columns are there
Index starts from 0. So $data[4] represents the last column value.

Learnerabc · March 16, 2010, 5:06am

thillai_selvan:

Simple. $#array will have the last index of the array. So using this we can easily get the last column number.
Example:
print $#data; #this will give 5.
So we can easily identify that totally there are 5 columns are there
Index starts from 0. So $data[4] represents the last column value.

I understand this, but what i am saying is..let's imagine you have more than more than 2000 colums ( just imagine )..and you don't want to count all of them 0 to 1999. so what will you do so that the script just take the last field without specifying the index number

thillai_selvan · March 16, 2010, 5:11am

print $array_name[-1]

This will give the last element in that array.

print $array_name[-2]

This will give the 2nd last element in that array.
Are you expecting this?

Learnerabc · March 16, 2010, 8:22am

Yes yes thanks for teaching all this.. thank you

---------- Post updated at 08:22 AM ---------- Previous update was at 05:13 AM ----------

thillai_selvan:

Use the following code

open FH, "<inp" or die "Can't open file : $!\n";
my @data=<FH>;
my @result;
my @count;
foreach (@data )
{
   push(@result,split);
}
for ( my $i=4; $i <= $#result; $i=$i+5 )
{
   push(@count,$result[$i]);
}

my %count_hash;
shift @count; #to remove the e from the array
foreach my $word (@count)
{
   $count_hash{$word}=$count_hash{$word}+1;
}

foreach my $word (keys %count_hash)
{
   print $word," Comes for ",$count_hash{$word}," times\n";
}

The output I am getting is follows

6 Comes for 1 times
4 Comes for 1 times
3 Comes for 1 times
2 Comes for 2 times

can you please tell why you did +5

for ( my $i=4; $i <= $#result; $i=$i+5 )

Thanks

thillai_selvan · March 16, 2010, 8:32am

Yeah!!! Pleasure!!!

If you print the "@result" array it will have the following values.

a b c d e 1 1 2 2 2 1 2 2 2 3 1 1 1 1 2 1 1 1 1 4 1 1 1 1 6

Your actual requirement is to find the occurrence of the values in the e column

Your input data:

So from the above values every fifth value is the e column's value.
Thats why I have incremented by 5 to get every 5th element.
Got it?