Sum of multiple columns based on few conditions

Input file:

0014,A,2020,K,011115000000,xyxxxxxxxxxxxxxx ,B,00001,0003200,0000111
0014,A,2020,K,011115000000,xyxxxxxxxxxxxxxx ,B,00001,0003300,0000100
0014,A,2020,K,011116000000,xyxxxxxxxxxxxxxx ,B,00001,0003400,0000100
0014,A,2020,K,011116000000,xyxxxxxxxxxxxxxx ,B,00007,0003100,0000200
0012,A,2020,K,047225000000,xyxxxxxxxxxxxxxx ,B,00001,0003000,0000100
0012,A,2020,K,047225000000,xyxxxxxxxxxxxxxx ,B,00001,0003100,0000100
0013,A,2020,K,047225000000,xyxxxxxxxxxxxxxx ,B,00001,0003300,0000100

i need to sum of multiple columns(8th,9th,10th) based on few conditions on the above file.
1) Sort by column 1
2)Sum of columns(8th,9th,10th) based on unique column 1 and column 5.
3)print the original record and print the output in a separate line(second column value should be T
and 6th column should be blank in the output line)

Output:

0012,A,2020,K,047225000000,xyxxxxxxxxxxxxxx ,B,00001,0003000,0000100
0012,A,2020,K,047225000000,xyxxxxxxxxxxxxxx ,B,00001,0003100,0000100
0012,T,2020,K,047225000000,                        ,B,00002,0006100,0000200
0013,A,2020,K,047225000000,xyxxxxxxxxxxxxxx ,B,00001,0003300,0000100
0013,T,2020,K,047225000000,                       ,B,00001,0003300,0000100
0014,A,2020,K,011115000000,xyxxxxxxxxxxxxxx ,B,00001,0003200,0000111
0014,A,2020,K,011115000000,xyxxxxxxxxxxxxxx ,B,00001,0003300,0000100
0014,A,2020,K,011116000000,xyxxxxxxxxxxxxxx ,B,00001,0003400,0000100
0014,A,2020,K,011116000000,xyxxxxxxxxxxxxxx ,B,00007,0003100,0000200
0014,T,2020,K,011115000000,                        ,B,00002,0006500,0000211
0014,T,2020,K,011116000000,                        ,B,00008,0006500,0000300

I tried the below command, the total is coming correct. The formatting is not correct and need the output mentioned above

awk -F',' '{a[$1"."$5]+=$8;b[$1"."$5]+=$9;c[$1"."$5]+=$10;f[$1"."$5]=$2;
g[$1"."$5]=$1;h[$1"."$5]=$3;
j[$1"."$5]=$4;k[$1"."$5]=$5;
r[$1"."$5]=$7
l[$1"."$5]=$6;m[$1"."$5]=$7}END{for(i in a)print g",",f",",h",",j",",k",",l",",r",",a",",b",",c |"sort -t ',' -n -k1,1"}' file.txt

Can anyone please assist me?

assuming you have GNU awk (gawk - preferably the latest 4.1.1): gawk -f vin.awk myFile where vin.awk is:

#!/bin/gawk -f
BEGIN {
  FS=OFS=","
  sc="8,9,10"
  sn=split(sc,scA,FS)
}
{
  idx=($1 SUBSEP $5)
  block[idx]=(idx in block)?block[idx] ORS $0:$0
  line[idx]=$0
  for(i=1; i in scA;i++)
     sumA[idx][scA]=sprintf("%07d", sumA[idx][scA]+$scA)
}
END {
  for(i in block) {
    print block
    lineN=split(line,lineA,FS)
    lineA[2]="T"
    lineA[6]=""
    for(j=1; j in scA;j++)
       lineA[scA[j]]=sumA[scA[j]]
    for(j=1;j<=lineN;j++)
       printf("%s%s%s", (j==1)?"":OFS, lineA[j],(j==lineN)?ORS:"")
  }
}

Thanks for the response.
gawk is not installed on our server , so this code is not working.
Can you try using awk ?

ok, try this with awk (use 'nawk' if on Solaris):

#!/bin/awk -f
BEGIN {
  FS=OFS=","
  sc="8,9,10"
  sn=split(sc,scA,FS)
}
{
  idx=($1 SUBSEP $5)
  block[idx]=(idx in block)?block[idx] ORS $0:$0
  line[idx]=$0
  for(i=1; i in scA;i++)
     sumA[idx,scA]=sprintf("%07d", sumA[idx,scA]+$scA)
}
END {
  for(i in block) {
    print block
    lineN=split(line,lineA,FS)
    lineA[2]="T"
    lineA[6]=""
    for(j=1; j in scA;j++)
       lineA[scA[j]]=sumA[i,scA[j]]
    for(j=1;j<=lineN;j++)
       printf("%s%s%s", (j==1)?"":OFS, lineA[j],(j==lineN)?ORS:"")
  }
}

Thanks a lot for ur help!
The code is working fine. One small issue is that the length of column 8 is 5 and its coming as 7 for column 8, column 9 and column 10 in the T record. How to fix this?

0012,T,2020,K,047225000000,                   ,B,0000002,0006100,0000200

It should be like this

0012,T,2020,K,047225000000,                   ,B,00002,0006100,0000200

Also can u please explain me the code?

this should be a bit prettier:

#!/bin/awk -f
BEGIN {
  FS=OFS=","
  sc="8,9,10"
  sn=split(sc,scA,FS)
}
{
  w6=length($6)
  idx=($1 SUBSEP $5)
  block[idx]=(idx in block)?block[idx] ORS $0:$0
  line[idx]=$0
  for(i=1; i in scA;i++)
     sumA[idx,scA]=sprintf("%0*d", length($scA),sumA[idx,scA]+$scA)
}
END {
  for(i in block) {
    print block
    lineN=split(line,lineA,FS)
    lineA[2]="T"
    lineA[6]=sprintf("%*s", w6, "")
    for(j=1; j in scA;j++)
       lineA[scA[j]]=sumA[i,scA[j]]
    for(j=1;j<=lineN;j++)
       printf("%s%s%s", (j==1)?"":OFS, lineA[j],(j==lineN)?ORS:"")
  }
}
1 Like

Thanks you so much for ur help. The code is working fine.

If you have the time, can you please explain me the code.