Sum up the column values group by using some field

12-11-2012,PNL,158406
12-11-2012,RISK,4564
12-11-2012,VAR_1D,310101
12-11-2012,VAR_10D,310101
12-11-2012,CB,866
12-11-2012,STR_VAR_1D,298494
12-11-2012,STR_VAR_10D,309623
09-11-2012,PNL,1024106
09-11-2012,RISK,4565
09-11-2012,VAR_1D,317211
09-11-2012,VAR_10D,317211
09-11-2012,CB,985
09-11-2012,JOB,640
09-11-2012,STR_VAR_1D,305364
09-11-2012,STR_VAR_10D,317211

I have a flat file like above,
So all I need to sum up column 3 values if column 2 matches word VAR for coulmn 1 date

So foe 12 Nov I summed up VAR_1D, VAR_10D, STR_VAR_1D, STR_VAR_10D
same goes for 09 Nov
So my final file looks like

12-11-2012,PNL,158406
12-11-2012,RISK,4564
12-11-2012,TOTAL_VAR,1228319----->(which is sum of 310101(VAR_1D)+310101(VAR_10D)+298494(STR_VAR_1D)+309623(STR_VAR_10D) for date 12Nov)
12-11-2012,CB,866
09-11-2012,PNL,1024106
09-11-2012,RISK,4565
09-11-2012,TOTAL_VAR,1256997---->(317211(VAR_1D)+317211(VAR_10D)+305364(STR_VAR_1D)+317211(STR_VAR_10D) for date 09Nov)
09-11-2012,CB,985
09-11-2012,JOB,640

How to achieve this, summing up using group by as Date.

~$ perl -ne '@fields=split/,/;
if ($date==$fields[0]){
  if ($fields[1]=~/VAR/){
    $var_total+=$fields[2];
  }
  else {
    print 
  }
}
else{
  print "$date,TOTAL_VAR,$var_total\n" if $date;
  $date=$fields[0]
}
END{
  print "$date,TOTAL_VAR,$var_total\n";
}
' tmp/tmp.dat
12-11-2012,RISK,4564
12-11-2012,CB,866
12-11-2012,TOTAL_VAR,1228319
09-11-2012,RISK,4565
09-11-2012,JOB_AUDIT_DATA_TYPE,640
09-11-2012,TOTAL_VAR,2486301

thanks sky, but issue is I can't use perl, can I same suggestion in awk or something like that...
as well as O/p doesn't align with the requested o/p, total var sum for 2nd date 9 Nov is not summed up as well as some columns for 09th nov are missing

awk -F, 'NR==FNR{if($2 ~ /VAR/){X[$1]+=$3};next}{P=1;if($2 ~ /VAR/ ){if(! arr[$1]){$2="TOTAL_VAR";$3=X[$1];P=1;arr[$1]++}else{P=0}}}P' OFS="," file file
1 Like

also try:

awk -F, '
$2 ~ /VAR/ {$2="TOTAL_VAR"};
{ a[$1","$2]+=$3; }
END { for (i in a) print i, a; }
' OFS=, infile | sort

Hi Pamu,

just in curious, what is the purpose of passing two times the file name at the end of command?

and if we want to format the output as below,

printf ("%10s %25s %10d\n",$1,$2,$3) 

how to achieve ?

try:

awk -F, '
$2 ~ /VAR/ {$2="TOTAL_VAR"};
{ a[$1","$2]+=$3; }
END {
  for (i in a) {s=i "," a;
    c=split(s,o,",");
    printf ("%10s %25s %10d\n",o[1],o[2],o[3]);
  }
}
' infile | sort

Here we are passing two times file name because we are performing operations twice on a file.

for your format try.

awk -F, 'NR==FNR{if($2 ~ /VAR/){X[$1]+=$3};next}
 {if($2 ~ /VAR/ ){if(! arr[$1]){$2="TOTAL_VAR";$3=X[$1];printf ("%10s %25s %10d\n",$1,$2,$3);arr[$1]++}else{P=0}}else{printf ("%10s %25s %10d\n",$1,$2,$3)}}'  file file