Sum of column by group wise

jambesh · February 21, 2010, 3:36am

Hello All ,

I have a problem with summing of column by group

Input File -

COL_1,COL_2,COL_3,COL_4,COL_5,COL_6,COL_7,COL_8,COL_9,COL_10,COL_11
3010,21,1923D ,6,0,0.26,0,0.26,-0.26,1,200807
3010,21,192BI ,6,24558.97,1943.94,0,1943.94,22615.03,1,200807
3010,21,192BI ,12,1156.91,62.68,0,62.68,1094.23,1,200807
3010,21,192MED,6,-1062.7,558.79,0,558.79,-1621.49,1,200807
3010,21,192MED,12,101.6,23.11,0,23.11,78.49,1,200807
3010,21,192PD ,6,-26516.61,1295.86,0,1295.86,-27812.47,1,200807
3010,21,211CCD,12,187.07,62.14,0,62.14,124.93,2,200807
3010,21,211COL,6,-4584.9,1531.98,0,1531.98,-6116.88,2,200807
3010,21,211COL,12,118.11,64.02,0,64.02,54.09,2,200807
3010,21,211F  ,6,0,0,0,0,0,2,200807
3010,21,211TOW,6,-64.86,74.12,0,74.12,-138.98,2,200807
3010,21,211TOW,12,19.14,2.66,0,2.66,16.48,2,200807
3010,63,192BI ,6,1043.51,147.32,0,147.32,896.19,3,200807
3010,63,192MED,6,16.25,15.46,0,15.46,0.78,3,200807
3010,63,192PD ,6,-1192.37,98.2,0,98.2,-1290.57,3,200807
3010,63,192UMB,6,18.7,20.6,0,20.6,-1.9,3,200807
3010,63,192UMP,6,-1.03,1.5,0,1.5,-2.53,3,200807

Out Put needed :

The column COL_10 contain the group -- it has value of 1 /2/3/4 -etc
as long as it it group 1 it will list the records and the moment category finished it will print the sum of column col_5,col_6,col_7,col_8,col_9
and then the header again and then category 2 start and so on ..

So i need the category wise sum of mentioned column

COL_1,COL_2,COL_3,COL_4,COL_5,COL_6,COL_7,COL_8,COL_9,COL_10,COL_11
3010,21,1923D ,6,0,0.26,0,0.26,-0.26,1,200807
3010,21,192BI ,6,24558.97,1943.94,0,1943.94,22615.03,1,200807
3010,21,192BI ,12,1156.91,62.68,0,62.68,1094.23,1,200807
3010,21,192MED,6,-1062.7,558.79,0,558.79,-1621.49,1,200807
3010,21,192MED,12,101.6,23.11,0,23.11,78.49,1,200807
3010,21,192PD ,6,-26516.61,1295.86,0,1295.86,-27812.47,1,200807

*SUM -This record should show the sum of the column col_5 through col_9

3010,21,211CCD,12,187.07,62.14,0,62.14,124.93,2,200807
3010,21,211COL,6,-4584.9,1531.98,0,1531.98,-6116.88,2,200807
3010,21,211COL,12,118.11,64.02,0,64.02,54.09,2,200807
3010,21,211F  ,6,0,0,0,0,0,2,200807
3010,21,211TOW,6,-64.86,74.12,0,74.12,-138.98,2,200807
3010,21,211TOW,12,19.14,2.66,0,2.66,16.48,2,200807

*SUM -This record should show the sum of the column col_5 through col_9

3010,63,192BI ,6,1043.51,147.32,0,147.32,896.19,3,200807
3010,63,192MED,6,16.25,15.46,0,15.46,0.78,3,200807
3010,63,192PD ,6,-1192.37,98.2,0,98.2,-1290.57,3,200807
3010,63,192UMB,6,18.7,20.6,0,20.6,-1.9,3,200807
3010,63,192UMP,6,-1.03,1.5,0,1.5,-2.53,3,200807

*SUM -This record should show the sum of the column col_5 through col_9

awk -solution would be disarable

malcomex999 · February 21, 2010, 6:45am

Ccolumn 10 should be in ascending order...

awk -F, 'NR==1{print;next}{
if($10==col10){sum+=sum + $5 + $6 + $7 + $8 + $9}
else{print "\nSum: " sum "\n";sum=0}}
{col10=$10;print $0}
END{print "\nSum: " sum "\n"}' infile

pravin27 · February 21, 2010, 6:51am

Hi,
I have edited the scripts ....

perl sum_column.pl test.txt

#!/usr/bin/perl

$filename=$ARGV[0];
`sort -t',' -nk10 $filename -o $filename`;
open (FH,$filename);
$current=0;
$sum_col4=0;
$sum_col5=0;
$sum_col6=0;
$sum_col7=0;
$sum_col8=0;
while (<FH>) {
chomp;
if ($. > 1 ) {
@array=split /,/;

if ($current != $array[9] ) {
        if ($. > 2 ) {
                print ",,,,$sum_col4,$sum_col5,$sum_col6,$sum_col7,$sum_col8,,,\n";
                $sum_col4=0;
                $sum_col5=0;
                $sum_col6=0;
                $sum_col7=0;
                $sum_col8=0;
        }
        print "Group $array[9] started:-- \n\n";
        print $_,"\n";
        $sum_col4= $array[4] + $sum_col4 ;
        $sum_col5= $array[5] + $sum_col5 ;
        $sum_col6= $array[6] + $sum_col6;
        $sum_col7= $array[7] + $sum_col7;
        $sum_col8= $array[8] + $sum_col8;
      }
else {
        print $_,"\n";
        $sum_col4= $array[4] + $sum_col4 ;
        $sum_col5= $array[5] + $sum_col5;
        $sum_col6= $array[6] + $sum_col6;
        $sum_col7= $array[7] + $sum_col7;
        $sum_col8= $array[8] + $sum_col8;
     }
}
$current = $array[9];
}
print ",,,,$sum_col4,$sum_col5,$sum_col6,$sum_col7,$sum_col8,,,\n";

ahmad.diab · February 21, 2010, 7:28am

nawk -F"," 'NR>1{a[$10]+=a[$10]+$5+$6+$7+$8+$9 ; b[$10]=b[$10]"\n"$0 ; next}
END{for (i in a) print b"\nSum "i" =\t"a}' infile.txt

BR

;);)

jambesh · February 21, 2010, 8:50am

Thank you all for responding me - however the requirement was different .

let me clarify it in more simpler -

Input file -

col_1,col_2,col_3,col_4,col_5,col_6,col_7,col_8,col_9,col_10,col_11
3010,21,1923D ,0,5,10,15,20,25,1,200807
3010,21,192BI ,0,60,65,70,75,80,1,200807
3011,22,192xx ,0,20,30,40,50,60,2,200903
3011,22,192xx ,0,25,30,40,50,60,2,200903
3011,22,192xx ,0,50,55,60,65,70,3,200903
3011,22,192xx ,0,25,30,40,50,60,3,200903

Column 10 contain category 1 , 2, 3
Out put would be-- At the end of each category it would print the
sum of column 5,6,7,8,9 and rest are just blank

Example :

col_1,col_2,col_3,col_4,col_5,col_6,col_7,col_8,col_9,col_10,col_11
3010,21,1923D ,0,5,10,15,20,25,1,200807
3010,21,192BI ,0,60,65,70,75,80,1,200807
,,,,65,75,85,95,105,,,  
col_1,col_2,col_3,col_4,col_5,col_6,col_7,col_8,col_9,col_10,col_11
3011,22,192xx ,0,20,30,40,50,60,2,200903
3011,22,192xx ,0,25,30,40,50,60,2,200903
,,,,45,60,80,100,120,,,
col_1,col_2,col_3,col_4,col_5,col_6,col_7,col_8,col_9,col_10,col_11
3011,22,192xx ,0,50,55,60,65,70,3,200903
3011,22,192xx ,0,25,30,40,50,60,3,200903
,,,,75,85,100,115,130,,

After end of each category it is printing the sum of col4 ,sum of col 5 ,sum of 6,sum of col7,sum of col 8,sum of col9 and then it start the header again then then same process for each category

Hope this helps to formulate an solution

radoulov · February 21, 2010, 9:42am

awk -F, 'END { 
  printf "%s\n%s\n,,,,%s,%s,%s,%s,%s,,,\n", h, r, \
  _5, _6, _7, _8, _9 
   }
NR == 1 { h = $0; next }   
!c[$10]++ && NR > 2 {
  printf "%s\n%s\n,,,,%s,%s,%s,%s,%s,,,\n", h, r, \
  _5, _6, _7, _8, _9
  _5 = _6 = _7 = _8 = _9 = r = x
  }
{  
  _5 += $5; _6 += $6; _7 += $7; _9 += $9
  _8 += $8; r = r ? r RS $0 : $0 
  }' infile

Franklin52 · February 21, 2010, 9:53am

Assuming column 10 is in ascending order:

awk -F, 'NR==1{h=$0;print;next}
g && g != $10 {
  print ",,,,"c5, c6, c7, c8, c9 ",,,"
  c5=c6=c7=c8=c9=""
  print h
}
{c5+=$5; c6+=$6; c7+=$7; c8+=$8; c9+=$9; g=$10}
END{print ",,,," c5, c6, c7, c8, c9 ",,,"}
1' OFS="," file

jambesh · February 21, 2010, 10:11am

Thank You Franklin for your help .. It worked as expected.
Thanks again for your time
Can you tell me what the number 1 would do at the end of the script .

This is a nice piece of code - If you have time can you also pls tell me

g && g != $10 {
  print ",,,,"c5, c6, c7, c8, c9 ",,,"
  c5=c6=c7=c8=c9=""
  print h
}

what this blok will do ?

Franklin52 · February 21, 2010, 3:57pm

jambesh:

Thank You Franklin for your help .. It worked as expected.
Thanks again for your time
Can you tell me what the number 1 would do at the end of the script .

This is a nice piece of code - If you have time can you also pls tell me
g && g != $10 {
  print ",,,,"c5, c6, c7, c8, c9 ",,,"
  c5=c6=c7=c8=c9=""
  print h
}  
what this blok will do ?

awk evaluates the 1 as true and the prints the current line by default, equivalent to {print}.

g && g != $10

If the variable g is set and not equal to $10
This true when the value of the 10th field differs from the previous record.

print ",,,,"c5, c6, c7, c8, c9 ",,,"

Print the line with the sums

c5=c6=c7=c8=c9=""

Reset the variables

print h

Print the header