I have some test data that is seperated out into annual records, each record has a start date (COL7), an end date (COL8) and a maturity date (COL18) - What I need to do is ensure that there is one record to cover each year right up until Maturity date (COL18).
In the first group of the below data for example the start date and end dates for the final record are 2018-12-01 and 2019-11-30, however the maturity date isnt until 2020-11-30 so i would need an extra record to cover that time period i.e. 2019-12-01 to 2020-11-30 -- The outputted record we need to carry all the same field values of the last record, except for COL9 which would be defaulted to 0 and the start and end dates would change the reflect the year that it is covering
Input
COL1,COL2,COL3,COL4,COL5,COL6,COL7,COL8,COL9,COL10,COL11,COL12,COL13,COL14,COL15,COL16,COL17,COL18
C,5678,TEST,2,AA,AAAA,2015-12-01,2016-11-30,190425.71,AAA,P,4305590.22,NULL,NULL,NULL,NULL,NULL,2020-11-30
C,5678,TEST,2,AA,AAAA,2016-12-01,2017-11-30,210365.77,AAA,P,4095224.45,NULL,NULL,NULL,NULL,NULL,2020-11-30
C,5678,TEST,2,AA,AAAA,2017-12-01,2018-11-30,232393.82,AAA,P,3862830.63,NULL,NULL,NULL,NULL,NULL,2020-11-30
C,5678,TEST,2,AA,AAAA,2018-12-01,2019-11-30,256728.49,AAA,P,3606102.14,NULL,NULL,NULL,NULL,NULL,2020-11-30
Desired Output
COL1,COL2,COL3,COL4,COL5,COL6,COL7,COL8,COL9,COL10,COL11,COL12,COL13,COL14,COL15,COL16,COL17,COL18
C,5678,TEST,2,AA,AAAA,2015-12-01,2016-11-30,190425.71,AAA,P,4305590.22,NULL,NULL,NULL,NULL,NULL,2020-11-30
C,5678,TEST,2,AA,AAAA,2016-12-01,2017-11-30,210365.77,AAA,P,4095224.45,NULL,NULL,NULL,NULL,NULL,2020-11-30
C,5678,TEST,2,AA,AAAA,2017-12-01,2018-11-30,232393.82,AAA,P,3862830.63,NULL,NULL,NULL,NULL,NULL,2020-11-30
C,5678,TEST,2,AA,AAAA,2018-12-01,2019-11-30,256728.49,AAA,P,3606102.14,NULL,NULL,NULL,NULL,NULL,2020-11-30
C,5678,TEST,2,AA,AAAA,2019-12-01,2020-11-30,0,AAA,P,3606102.14,NULL,NULL,NULL,NULL,NULL,2020-11-30
I could also have an instance for example, where there are missing records in between the test data, and again this gaps would need filling with a record, taking the values from the previous record and again, setting COL9 to 0 and the start and end dates would change the reflect the year that it is covering
Input
COL1,COL2,COL3,COL4,COL5,COL6,COL7,COL8,COL9,COL10,COL11,COL12,COL13,COL14,COL15,COL16,COL17,COL18
C,1234,TEST,1,AA,AAAA,2015-12-01,2016-11-30,190425.71,AAA,P,4305590.22,NULL,NULL,NULL,NULL,NULL,2019-11-30
C,1234,TEST,1,AA,AAAA,2017-12-01,2018-11-30,232393.82,AAA,P,3862830.63,NULL,NULL,NULL,NULL,NULL,2019-11-30
C,1234,TEST,1,AA,AAAA,2018-12-01,2019-11-30,256728.49,AAA,P,3606102.14,NULL,NULL,NULL,NULL,NULL,2019-11-30
Desired output
COL1,COL2,COL3,COL4,COL5,COL6,COL7,COL8,COL9,COL10,COL11,COL12,COL13,COL14,COL15,COL16,COL17,COL18
C,1234,TEST,1,AA,AAAA,2015-12-01,2016-11-30,190425.71,AAA,P,4305590.22,NULL,NULL,NULL,NULL,NULL,2019-11-30
C,1234,TEST,1,AA,AAAA,2016-12-01,2017-11-30,0,AAA,P,4305590.22,NULL,NULL,NULL,NULL,NULL,2019-11-30
C,1234,TEST,1,AA,AAAA,2017-12-01,2018-11-30,232393.82,AAA,P,3862830.63,NULL,NULL,NULL,NULL,NULL,2019-11-30
C,1234,TEST,1,AA,AAAA,2018-12-01,2019-11-30,256728.49,AAA,P,3606102.14,NULL,NULL,NULL,NULL,NULL,2019-11-30
Note: COL2,COL3 and COL4 are the keys and esentially whilever these 3 values are the same, those records are grouped together and the above logic should be performed on each group of records
For example the below test data contains 3 'groups' of data
COL1,COL2,COL3,COL4,COL5,COL6,COL7,COL8,COL9,COL10,COL11,COL12,COL13,COL14,COL15,COL16,COL17,COL18
C,1234,TEST,1,AA,AAAA,2015-12-01,2016-11-30,190425.71,AAA,P,4305590.22,NULL,NULL,NULL,NULL,NULL,2019-11-30
C,1234,TEST,1,AA,AAAA,2017-12-01,2018-11-30,232393.82,AAA,P,3862830.63,NULL,NULL,NULL,NULL,NULL,2019-11-30
C,1234,TEST,1,AA,AAAA,2018-12-01,2019-11-30,256728.49,AAA,P,3606102.14,NULL,NULL,NULL,NULL,NULL,2019-11-30
C,5678,TEST,2,AA,AAAA,2015-12-01,2016-11-30,190425.71,AAA,P,4305590.22,NULL,NULL,NULL,NULL,NULL,2020-11-30
C,5678,TEST,2,AA,AAAA,2016-12-01,2017-11-30,210365.77,AAA,P,4095224.45,NULL,NULL,NULL,NULL,NULL,2020-11-30
C,5678,TEST,2,AA,AAAA,2017-12-01,2018-11-30,232393.82,AAA,P,3862830.63,NULL,NULL,NULL,NULL,NULL,2020-11-30
C,5678,TEST,2,AA,AAAA,2018-12-01,2019-11-30,256728.49,AAA,P,3606102.14,NULL,NULL,NULL,NULL,NULL,2020-11-30
C,9999,TEST,3,AA,AAAA,2015-12-01,2016-11-30,190425.71,AAA,P,4305590.22,NULL,NULL,NULL,NULL,NULL,2019-11-30
C,9999,TEST,3,AA,AAAA,2016-12-01,2017-11-30,232393.82,AAA,P,3862830.63,NULL,NULL,NULL,NULL,NULL,2018-10-01
C,9999,TEST,3,AA,AAAA,2017-12-01,2018-11-30,256728.49,AAA,P,3606102.14,NULL,NULL,NULL,NULL,NULL,2018-10-01
I have tried writting the above using a loop in linux, but only managed to get that working partially, however the partial solution I came up with is very slow and isnt performant on large numbers of records.
I was hoping that maybe someone could help me write some of the above logic using awk? My knowledge of awk is very limited so any help would be much appreciated.