Sum up the column values group by using some field

manas_ranjan · November 13, 2012, 5:53am

12-11-2012,PNL,158406
12-11-2012,RISK,4564
12-11-2012,VAR_1D,310101
12-11-2012,VAR_10D,310101
12-11-2012,CB,866
12-11-2012,STR_VAR_1D,298494
12-11-2012,STR_VAR_10D,309623
09-11-2012,PNL,1024106
09-11-2012,RISK,4565
09-11-2012,VAR_1D,317211
09-11-2012,VAR_10D,317211
09-11-2012,CB,985
09-11-2012,JOB,640
09-11-2012,STR_VAR_1D,305364
09-11-2012,STR_VAR_10D,317211

I have a flat file like above,
So all I need to sum up column 3 values if column 2 matches word VAR for coulmn 1 date

So foe 12 Nov I summed up VAR_1D, VAR_10D, STR_VAR_1D, STR_VAR_10D
same goes for 09 Nov
So my final file looks like

12-11-2012,PNL,158406
12-11-2012,RISK,4564
12-11-2012,TOTAL_VAR,1228319----->(which is sum of 310101(VAR_1D)+310101(VAR_10D)+298494(STR_VAR_1D)+309623(STR_VAR_10D) for date 12Nov)
12-11-2012,CB,866
09-11-2012,PNL,1024106
09-11-2012,RISK,4565
09-11-2012,TOTAL_VAR,1256997---->(317211(VAR_1D)+317211(VAR_10D)+305364(STR_VAR_1D)+317211(STR_VAR_10D) for date 09Nov)
09-11-2012,CB,985
09-11-2012,JOB,640

How to achieve this, summing up using group by as Date.

Skrynesaver · November 13, 2012, 6:07am

~$ perl -ne '@fields=split/,/;
if ($date==$fields[0]){
  if ($fields[1]=~/VAR/){
    $var_total+=$fields[2];
  }
  else {
    print 
  }
}
else{
  print "$date,TOTAL_VAR,$var_total\n" if $date;
  $date=$fields[0]
}
END{
  print "$date,TOTAL_VAR,$var_total\n";
}
' tmp/tmp.dat
12-11-2012,RISK,4564
12-11-2012,CB,866
12-11-2012,TOTAL_VAR,1228319
09-11-2012,RISK,4565
09-11-2012,JOB_AUDIT_DATA_TYPE,640
09-11-2012,TOTAL_VAR,2486301

manas_ranjan · November 13, 2012, 6:45am

thanks sky, but issue is I can't use perl, can I same suggestion in awk or something like that...
as well as O/p doesn't align with the requested o/p, total var sum for 2nd date 9 Nov is not summed up as well as some columns for 09th nov are missing

pamu · November 13, 2012, 7:31am

awk -F, 'NR==FNR{if($2 ~ /VAR/){X[$1]+=$3};next}{P=1;if($2 ~ /VAR/ ){if(! arr[$1]){$2="TOTAL_VAR";$3=X[$1];P=1;arr[$1]++}else{P=0}}}P' OFS="," file file

rdrtx1 · November 13, 2012, 10:16am

also try:

awk -F, '
$2 ~ /VAR/ {$2="TOTAL_VAR"};
{ a[$1","$2]+=$3; }
END { for (i in a) print i, a; }
' OFS=, infile | sort

manas_ranjan · November 13, 2012, 10:20am

Hi Pamu,

just in curious, what is the purpose of passing two times the file name at the end of command?

and if we want to format the output as below,

printf ("%10s %25s %10d\n",$1,$2,$3)

how to achieve ?

rdrtx1 · November 13, 2012, 10:28am

try:

awk -F, '
$2 ~ /VAR/ {$2="TOTAL_VAR"};
{ a[$1","$2]+=$3; }
END {
  for (i in a) {s=i "," a;
    c=split(s,o,",");
    printf ("%10s %25s %10d\n",o[1],o[2],o[3]);
  }
}
' infile | sort

pamu · November 13, 2012, 10:34am

manas_ranjan:

Hi Pamu,

just in curious, what is the purpose of passing two times the file name at the end of command?

and if we want to format the output as below,
printf ("%10s %25s %10d\n",$1,$2,$3) 
how to achieve ?

Here we are passing two times file name because we are performing operations twice on a file.

for your format try.

awk -F, 'NR==FNR{if($2 ~ /VAR/){X[$1]+=$3};next}
 {if($2 ~ /VAR/ ){if(! arr[$1]){$2="TOTAL_VAR";$3=X[$1];printf ("%10s %25s %10d\n",$1,$2,$3);arr[$1]++}else{P=0}}else{printf ("%10s %25s %10d\n",$1,$2,$3)}}'  file file