Input data
COL_1,COL_2,COL_3,COL_4,COL_5,COL_6,COL_7,COL_8,COL_9,COL_10,COL_11,COL_12,COL_13
C,ABC,ABCD,3,ZZ,WLOA,2015-12-01,2016-12-01,975.73,ZZZ,P,111111.00,Y1 **GROUP1**
C,ABC,ABCD,3,ZZ,WLOA,2015-12-01,2016-12-01,975.73,ZZZ,P,222222.00,Y1 **GROUP1**
C,ABC,ABCD,3,ZZ,WLOA,2015-12-01,2016-12-01,975.73,ZZZ,P,333333.00,Y1 **GROUP1**
C,ABC,ABCD,3,ZZ,WLOA,2016-12-01,2017-12-01,898.86,ZZZ,P,444444.00,Y2 **GROUP2**
C,ABC,ABCD,3,ZZ,WLOA,2016-12-01,2017-12-01,898.86,ZZZ,P,555555.00,Y2 **GROUP2**
C,ABC,ABCD,3,ZZ,WLOA,2016-12-01,2017-12-01,898.86,ZZZ,P,666666.00,Y2 **GROUP2**
C,ABC,ABCD,3,ZZ,WLOA,2017-12-01,2018-12-01,788.81,ZZZ,P,777777.10,Y3 **GROUP3**
C,ABC,ABCD,3,ZZ,WLOA,2019-12-01,2020-12-01,788.81,ZZZ,P,888888.10,Y5 **GROUP4**
C,ABC,ABCD,3,ZZ,WLOA,2020-12-31,2021-10-29,788.81,ZZZ,P,999999.10,Y5+ **GROUP5**
C,ABC,ABCD,4,ZZ,WLOA,2015-12-01,2016-12-01,788.81,ZZZ,P,101010.10,Y1 **GROUP6**
C,ABC,ABCD,4,ZZ,WLOA,2015-12-01,2016-12-01,788.81,ZZZ,P,123123.10,Y1 **GROUP6**
I am using the following command to print out one record where COL_4, COL_7, COL_8 and COL_13 match - It then sums up all the values from COL_9 and prints into the one record
Code and output;
awk 'BEGIN{FS=OFS=","}
NR==1{print;next}
{a[$4";"$7";"$8";"$13]+=$9}
END{for(i in a)print $1, $2, $3, substr(i,0,1), $5, $6, substr(i,3,10), substr(i,14,10), a, $10, $11, $12}' BUCKETED.csv > COL_TOTAL.csv
COL_1,COL_2,COL_3,COL_4,COL_5,COL_6,COL_7,COL_8,COL_9,COL_10,COL_11,COL_12,COL_13
C,ABC,ABCD,3,ZZ,WLOA,2019-12-01,2020-12-01,788.81,ZZZ,P,123123.10,Y4 **GROUP4 Record**
C,ABC,ABCD,3,ZZ,WLOA,2017-12-01,2018-12-01,788.81,ZZZ,P,123123.10,Y3 **GROUP3 Record**
C,ABC,ABCD,3,ZZ,WLOA,2016-12-01,2017-12-01,2696.58,ZZZ,P,123123.10,Y2 **GROUP2 Record**
C,ABC,ABCD,3,ZZ,WLOA,2015-12-01,2016-12-01,2927.19,ZZZ,P,123123.10,Y1 **GROUP1 Record**
C,ABC,ABCD,3,ZZ,WLOA,2020-12-31,2021-10-29,788.81,ZZZ,P,123123.10,Y5 **GROUP5 Record**
C,ABC,ABCD,4,ZZ,WLOA,2015-12-01,2016-12-01,1577.62,ZZZ,P,123123.10,Y6 **GROUP6 Record**
The issue I have is with COL_12 and bringing back the same value everytime (123123.10) - The desired output would be to bring back only the final record from each group, so for example we would bring back the following records from the groups
GROUP1=333333.00
GROUP2=666666.00
GROUP3=777777.10
GROUP4=888888.10
GROUP5=999999.10
GROUP6=123123.10
The correct output therefore would show as;
C,ABC,ABCD,3,ZZ,WLOA,2019-12-01,2020-12-01,788.81,ZZZ,P,888888.10,,Y4
C,ABC,ABCD,3,ZZ,WLOA,2017-12-01,2018-12-01,788.81,ZZZ,P,777777.10,Y3
C,ABC,ABCD,3,ZZ,WLOA,2016-12-01,2017-12-01,2696.58,ZZZ,P,666666.00,Y2
C,ABC,ABCD,3,ZZ,WLOA,2015-12-01,2016-12-01,2927.19,ZZZ,P,333333.00,Y1
C,ABC,ABCD,3,ZZ,WLOA,2020-12-31,2021-10-29,788.81,ZZZ,P,999999.10,Y5
C,ABC,ABCD,4,ZZ,WLOA,2015-12-01,2016-12-01,1577.62,ZZZ,P,123123.10,Y6
Is there any way of bringing back just this record to be printed in the awk command?
Thanks for your help.