awk - printing new lines based of 2 dates

I have some test data that is seperated out into annual records, each record has a start date (COL7), an end date (COL8) and a maturity date (COL18) - What I need to do is ensure that there is one record to cover each year right up until Maturity date (COL18).

In the first group of the below data for example the start date and end dates for the final record are 2018-12-01 and 2019-11-30, however the maturity date isnt until 2020-11-30 so i would need an extra record to cover that time period i.e. 2019-12-01 to 2020-11-30 -- The outputted record we need to carry all the same field values of the last record, except for COL9 which would be defaulted to 0 and the start and end dates would change the reflect the year that it is covering

Input

COL1,COL2,COL3,COL4,COL5,COL6,COL7,COL8,COL9,COL10,COL11,COL12,COL13,COL14,COL15,COL16,COL17,COL18
C,5678,TEST,2,AA,AAAA,2015-12-01,2016-11-30,190425.71,AAA,P,4305590.22,NULL,NULL,NULL,NULL,NULL,2020-11-30
C,5678,TEST,2,AA,AAAA,2016-12-01,2017-11-30,210365.77,AAA,P,4095224.45,NULL,NULL,NULL,NULL,NULL,2020-11-30
C,5678,TEST,2,AA,AAAA,2017-12-01,2018-11-30,232393.82,AAA,P,3862830.63,NULL,NULL,NULL,NULL,NULL,2020-11-30
C,5678,TEST,2,AA,AAAA,2018-12-01,2019-11-30,256728.49,AAA,P,3606102.14,NULL,NULL,NULL,NULL,NULL,2020-11-30

Desired Output

 
COL1,COL2,COL3,COL4,COL5,COL6,COL7,COL8,COL9,COL10,COL11,COL12,COL13,COL14,COL15,COL16,COL17,COL18
C,5678,TEST,2,AA,AAAA,2015-12-01,2016-11-30,190425.71,AAA,P,4305590.22,NULL,NULL,NULL,NULL,NULL,2020-11-30
C,5678,TEST,2,AA,AAAA,2016-12-01,2017-11-30,210365.77,AAA,P,4095224.45,NULL,NULL,NULL,NULL,NULL,2020-11-30
C,5678,TEST,2,AA,AAAA,2017-12-01,2018-11-30,232393.82,AAA,P,3862830.63,NULL,NULL,NULL,NULL,NULL,2020-11-30
C,5678,TEST,2,AA,AAAA,2018-12-01,2019-11-30,256728.49,AAA,P,3606102.14,NULL,NULL,NULL,NULL,NULL,2020-11-30
C,5678,TEST,2,AA,AAAA,2019-12-01,2020-11-30,0,AAA,P,3606102.14,NULL,NULL,NULL,NULL,NULL,2020-11-30
 

I could also have an instance for example, where there are missing records in between the test data, and again this gaps would need filling with a record, taking the values from the previous record and again, setting COL9 to 0 and the start and end dates would change the reflect the year that it is covering

Input

COL1,COL2,COL3,COL4,COL5,COL6,COL7,COL8,COL9,COL10,COL11,COL12,COL13,COL14,COL15,COL16,COL17,COL18
C,1234,TEST,1,AA,AAAA,2015-12-01,2016-11-30,190425.71,AAA,P,4305590.22,NULL,NULL,NULL,NULL,NULL,2019-11-30
C,1234,TEST,1,AA,AAAA,2017-12-01,2018-11-30,232393.82,AAA,P,3862830.63,NULL,NULL,NULL,NULL,NULL,2019-11-30
C,1234,TEST,1,AA,AAAA,2018-12-01,2019-11-30,256728.49,AAA,P,3606102.14,NULL,NULL,NULL,NULL,NULL,2019-11-30

Desired output

 
COL1,COL2,COL3,COL4,COL5,COL6,COL7,COL8,COL9,COL10,COL11,COL12,COL13,COL14,COL15,COL16,COL17,COL18
C,1234,TEST,1,AA,AAAA,2015-12-01,2016-11-30,190425.71,AAA,P,4305590.22,NULL,NULL,NULL,NULL,NULL,2019-11-30
C,1234,TEST,1,AA,AAAA,2016-12-01,2017-11-30,0,AAA,P,4305590.22,NULL,NULL,NULL,NULL,NULL,2019-11-30
C,1234,TEST,1,AA,AAAA,2017-12-01,2018-11-30,232393.82,AAA,P,3862830.63,NULL,NULL,NULL,NULL,NULL,2019-11-30
C,1234,TEST,1,AA,AAAA,2018-12-01,2019-11-30,256728.49,AAA,P,3606102.14,NULL,NULL,NULL,NULL,NULL,2019-11-30
 

Note: COL2,COL3 and COL4 are the keys and esentially whilever these 3 values are the same, those records are grouped together and the above logic should be performed on each group of records

For example the below test data contains 3 'groups' of data

COL1,COL2,COL3,COL4,COL5,COL6,COL7,COL8,COL9,COL10,COL11,COL12,COL13,COL14,COL15,COL16,COL17,COL18
C,1234,TEST,1,AA,AAAA,2015-12-01,2016-11-30,190425.71,AAA,P,4305590.22,NULL,NULL,NULL,NULL,NULL,2019-11-30
C,1234,TEST,1,AA,AAAA,2017-12-01,2018-11-30,232393.82,AAA,P,3862830.63,NULL,NULL,NULL,NULL,NULL,2019-11-30
C,1234,TEST,1,AA,AAAA,2018-12-01,2019-11-30,256728.49,AAA,P,3606102.14,NULL,NULL,NULL,NULL,NULL,2019-11-30
C,5678,TEST,2,AA,AAAA,2015-12-01,2016-11-30,190425.71,AAA,P,4305590.22,NULL,NULL,NULL,NULL,NULL,2020-11-30
C,5678,TEST,2,AA,AAAA,2016-12-01,2017-11-30,210365.77,AAA,P,4095224.45,NULL,NULL,NULL,NULL,NULL,2020-11-30
C,5678,TEST,2,AA,AAAA,2017-12-01,2018-11-30,232393.82,AAA,P,3862830.63,NULL,NULL,NULL,NULL,NULL,2020-11-30
C,5678,TEST,2,AA,AAAA,2018-12-01,2019-11-30,256728.49,AAA,P,3606102.14,NULL,NULL,NULL,NULL,NULL,2020-11-30
C,9999,TEST,3,AA,AAAA,2015-12-01,2016-11-30,190425.71,AAA,P,4305590.22,NULL,NULL,NULL,NULL,NULL,2019-11-30
C,9999,TEST,3,AA,AAAA,2016-12-01,2017-11-30,232393.82,AAA,P,3862830.63,NULL,NULL,NULL,NULL,NULL,2018-10-01
C,9999,TEST,3,AA,AAAA,2017-12-01,2018-11-30,256728.49,AAA,P,3606102.14,NULL,NULL,NULL,NULL,NULL,2018-10-01

I have tried writting the above using a loop in linux, but only managed to get that working partially, however the partial solution I came up with is very slow and isnt performant on large numbers of records.

I was hoping that maybe someone could help me write some of the above logic using awk? My knowledge of awk is very limited so any help would be much appreciated.

Not sure if this is the most elegant one, esp. as my awk doesn't have date manipulation capabilities, but try

awk -F, '
function GTM(T,D)       {cmd = "date +%Y-%m-%d -d\"" T D "days\""
                         cmd | getline X
                         close (cmd)
                         return X
                        }

NR == 1         {print
                 next
                }

$2 != LAST &&
R               {T = $0
                 $0 = LL
                 $7 = L
                 $8 = $18
                 $9 = 0
                 print
                 $0 = T
                }

$7 > L && R     {T = $0
                 $8 = GTM($7, " -1")
                 $7 = L
                 $9 = 0
                 print
                 $0 = T
                }

1
                {L = GTM($8," +1")
                 R = ($8 != $18)
                 LAST = $2
                 LL = $0
                }

END             {if (R) {$7 = L
                         $8 = $18
                         $9 = 0
                         print
                        }
                }
' OFS="," file
COL1,COL2,COL3,COL4,COL5,COL6,COL7,COL8,COL9,COL10,COL11,COL12,COL13,COL14,COL15,COL16,COL17,COL18
C,1234,TEST,1,AA,AAAA,2015-12-01,2016-11-30,190425.71,AAA,P,4305590.22,NULL,NULL,NULL,NULL,NULL,2019-11-30
C,1234,TEST,1,AA,AAAA,2016-12-01,2017-11-30,0,AAA,P,3862830.63,NULL,NULL,NULL,NULL,NULL,2019-11-30
C,1234,TEST,1,AA,AAAA,2017-12-01,2018-11-30,232393.82,AAA,P,3862830.63,NULL,NULL,NULL,NULL,NULL,2019-11-30
C,1234,TEST,1,AA,AAAA,2018-12-01,2019-11-30,256728.49,AAA,P,3606102.14,NULL,NULL,NULL,NULL,NULL,2020-11-30
C,1234,TEST,1,AA,AAAA,2019-12-01,2020-11-30,0,AAA,P,3606102.14,NULL,NULL,NULL,NULL,NULL,2020-11-30
C,5678,TEST,2,AA,AAAA,2015-12-01,2016-11-30,190425.71,AAA,P,4305590.22,NULL,NULL,NULL,NULL,NULL,2020-11-30
C,5678,TEST,2,AA,AAAA,2016-12-01,2017-11-30,210365.77,AAA,P,4095224.45,NULL,NULL,NULL,NULL,NULL,2020-11-30
C,5678,TEST,2,AA,AAAA,2017-12-01,2018-11-30,232393.82,AAA,P,3862830.63,NULL,NULL,NULL,NULL,NULL,2020-11-30
C,5678,TEST,2,AA,AAAA,2018-12-01,2019-11-30,256728.49,AAA,P,3606102.14,NULL,NULL,NULL,NULL,NULL,2020-11-30
C,5678,TEST,2,AA,AAAA,2019-12-01,2020-11-30,0,AAA,P,3606102.14,NULL,NULL,NULL,NULL,NULL,2020-11-30
C,9999,TEST,3,AA,AAAA,2015-12-01,2016-11-30,190425.71,AAA,P,4305590.22,NULL,NULL,NULL,NULL,NULL,2019-11-30
C,9999,TEST,3,AA,AAAA,2016-12-01,2017-11-30,232393.82,AAA,P,3862830.63,NULL,NULL,NULL,NULL,NULL,2018-10-01
C,9999,TEST,3,AA,AAAA,2017-12-01,2018-11-30,256728.49,AAA,P,3606102.14,NULL,NULL,NULL,NULL,NULL,2023-10-01
C,9999,TEST,3,AA,AAAA,2018-12-01,2023-10-01,0,AAA,P,3606102.14,NULL,NULL,NULL,NULL,NULL,2023-10-01

I added / postponed some maturity dates for testing purposes...

Hi Rudi,

Thank you so much for the above, that's brilliant!

There is a couple of things I dont think I was clear on to begin with so apologies for that.

In the above output that you have provided, the last record where COL2=9999 the start and end dates run from 2018-12-01 to 2023-10-01 - What I would actually require would be one seperate record for each year period.

I have provided a couple of example outputs below that might be a little clearer - I have made the new lines that are to be created in bold;

COL1,COL2,COL3,COL4,COL5,COL6,COL7,COL8,COL9,COL10,COL11,COL12,COL13,COL14,COL15,COL16,COL17,COL18
C,1234,TEST,1,AA,AAAA,2015-12-01,2016-11-30,190425.71,AAA,P,4305590.22,NULL,NULL,NULL,NULL,NULL,2020-11-30
C,1234,TEST,1,AA,AAAA,2016-12-01,2017-11-30,0,AAA,P,4305590.22,NULL,NULL,NULL,NULL,NULL,2020-11-30 -- COL9 set the 0 and COL12 takes the value from the line above
C,1234,TEST,1,AA,AAAA,2017-12-01,2018-11-30,0,AAA,P,4305590.22,NULL,NULL,NULL,NULL,NULL,2020-11-30 -- COL9 set the 0 and COL12 takes the value from the line above
C,1234,TEST,1,AA,AAAA,2018-12-01,2019-11-30,256728.49,AAA,P,3606102.14,NULL,NULL,NULL,NULL,NULL,2020-11-30
C,1234,TEST,1,AA,AAAA,2019-12-01,2020-11-30,256728.49,AAA,P,3241002.14,NULL,NULL,NULL,NULL,NULL,2020-11-30
COL1,COL2,COL3,COL4,COL5,COL6,COL7,COL8,COL9,COL10,COL11,COL12,COL13,COL14,COL15,COL16,COL17,COL18
C,5678,TEST,2,AA,AAAA,2015-12-01,2016-11-30,190425.71,AAA,P,4305590.22,NULL,NULL,NULL,NULL,NULL,2021-10-10
C,5678,TEST,2,AA,AAAA,2016-12-01,2017-11-30,210365.77,AAA,P,4095224.45,NULL,NULL,NULL,NULL,NULL,2021-10-10
C,5678,TEST,2,AA,AAAA,2017-12-01,2018-11-30,232393.82,AAA,P,3862830.63,NULL,NULL,NULL,NULL,NULL,2021-10-10
C,5678,TEST,2,AA,AAAA,2018-12-01,2019-11-30,256728.49,AAA,P,3606102.14,NULL,NULL,NULL,NULL,NULL,2021-10-10
C,5678,TEST,2,AA,AAAA,2019-12-01,2020-11-30,0,AAA,P,3606102.14,NULL,NULL,NULL,NULL,NULL,2021-10-10 -- Again COL9 is set to 0 and COL12 takes the last known value i.e the line above
C,5678,TEST,2,AA,AAAA,2020-12-01,2021-11-30,0,AAA,P,3606102.14,NULL,NULL,NULL,NULL,NULL,2021-10-10 -- Again COL9 is set to 0 and COL12 takes the last known value i.e the line above

Also worth noting that for each 'group' of data, the Maturity date will be the same. The end date (COL8) could be after the Maturity date, as we need to ensure that the start and end date cover 1 year i.e. 2018-12-01 TO 2019-11-30

I am taking a look at the code myself to see if I can try and figure out what is happening, but any help would be appreciated

Hi Adam,

Watching this post carefully as I have a similar requirement. Hopefully someone will be able to assist you! :slight_smile:

I'm not sure that I fully understand what you're trying to do, but this seems to produce the output you want for each of the sample inputs you have shown us:

awk '
NR == 1 {
	print
	FS = OFS = ","
	next
}
function addlines(start1, end1, count) {
	if(NR < 3) return
	for(i = 0; i < count; i++) {
		for(j = 1; j < 7; j++)
			printf("%s%s", fields[j], OFS)
		printf("%4d%s%s%4d%s%s0%s", start1 + i, startmd, OFS, end1 + i,
		    endmd, OFS, OFS)
		for(j = 10; j <= 18; j++)
			printf("%s%s", fields[j], (j < 18) ? OFS : ORS)
	}
}
$2 != last {
	# $2 has changed, add any needed entries from previous line up to and
	# including the maturity year.
	addlines(startyear + 1, endyear + 1, maturityyear - startyear)
	# Gather year and month & day from fields 7, 8 and, 18.
	split($0, fields)
	last = $2
	startyear = substr($7, 1, 4)
	startmd = substr($7, 5)
	endyear = substr($8, 1, 4)
	endmd = substr($8, 5)
	maturityyear = substr($18, 1, 4)
	maturitymd = substr($18, 5)
	# If start month & day comes after maturity month & day decrement
	# maturity year.
	if(startmd > maturitymd)
		maturityyear--
	# Print current entry.
	print
	next
}
{	# $2 has not changed since the previous line.  
	# Get new start and end years from fields 7 & 8.
	nstartyear = substr($7, 1, 4)
	nendyear = substr($8, 1, 4)
	# Add any needed entries from previous line to this line.
	addlines(startyear + 1, endyear + 1, nstartyear - startyear - 1)
	# Reset startyear, endyear, and fields[] for next line.
	startyear = nstartyear
	endyear = nendyear
	split($0, fields)
	# Print current entry.
	print
}
END {	addlines(startyear + 1, endyear + 1, maturityyear - startyear)
}' file

As always, if you want to try this on a Solaris/SunOS system, change awk to /usr/xpg4/bin/awk or nawk .

1 Like

Hi Don,

Thanks a lot for the above - I have tested that and it works really nicely for what I need - there is just one scenario which I don't think I explained too well before;

There could be a scenario where we don't have the very first record i.e. the reporting day record, so we would have to create the records before based on a Reporting date parameter.

For example

Input record - Note that Reporting date is 2015-12-01;

COL1,COL2,COL3,COL4,COL5,COL6,COL7,COL8,COL9,COL10,COL11,COL12,COL13,COL14,COL15,COL16,COL17,COL18
C,1234,TEST,1,AA,AAAA,2019-12-01,2020-11-30,190425.71,AAA,P,4305590.22,NULL,NULL,NULL,NULL,NULL,2020-11-30

As we don't have the reporting date record, we would need to create that as below, right up until Maturity date, or in this case, the final record;

What we are trying to say in the below example is that up until the final record, no units were used in the previous years. Therefore we need to create records that reflect this - units used (COL9) will be 0 as nothing has used, and the outstanding balance should be calculated by adding units (COL9) to outstanding balance (COL12) of the record we know.

Note:
COL9 - Units used that year
COL12 - Outstanding Balance i.e. Units remaining

Desired output;

COL1,COL2,COL3,COL4,COL5,COL6,COL7,COL8,COL9,COL10,COL11,COL12,COL13,COL14,COL15,COL16,COL17,COL18
C,1234,TEST,1,AA,AAAA,2015-12-01,2016-11-30,0,AAA,P,4496015.93,NULL,NULL,NULL,NULL,NULL,2020-11-30
C,1234,TEST,1,AA,AAAA,2016-12-01,2017-11-30,0,AAA,P,4496015.93,NULL,NULL,NULL,NULL,NULL,2020-11-30 
C,1234,TEST,1,AA,AAAA,2017-12-01,2018-11-30,0,AAA,P,4496015.93,NULL,NULL,NULL,NULL,NULL,2020-11-30 
C,1234,TEST,1,AA,AAAA,2018-12-01,2019-11-30,0,AAA,P,4496015.93,NULL,NULL,NULL,NULL,NULL,2020-11-30
C,1234,TEST,1,AA,AAAA,2019-12-01,2020-11-30,190425.71,AAA,P,4305590.22,NULL,NULL,NULL,NULL,NULL,2020-11-30  -- Original input record

On the same hand, we could also have a scenario where we have used x amount of units half way through the term, and then again, nothing right up until maturity date;

For example

Input record - Note again reporting date is 2015-12-01;

COL1,COL2,COL3,COL4,COL5,COL6,COL7,COL8,COL9,COL10,COL11,COL12,COL13,COL14,COL15,COL16,COL17,COL18
C,1234,TEST,1,AA,AAAA,2017-12-01,2018-11-30,190425.71,AAA,P,4305590.22,NULL,NULL,NULL,NULL,NULL,2020-11-30

In this scenario we would have to add in records before (starting from reporting date) again using the same logic as above (COL9 set to 0 and COL12 being worked out as COL9+COL12) - We would also have to add records in after the record that we know of i.e. up until Maturity date. In this case, as you have kindly done in your previous bit of code COL9 would be set to 0 and COL12 would take the value of the previous record.

Desired output;

COL1,COL2,COL3,COL4,COL5,COL6,COL7,COL8,COL9,COL10,COL11,COL12,COL13,COL14,COL15,COL16,COL17,COL18
C,1234,TEST,1,AA,AAAA,2015-12-01,2016-11-30,0,AAA,P,4496015.93,NULL,NULL,NULL,NULL,NULL,2020-11-30
C,1234,TEST,1,AA,AAAA,2016-12-01,2017-11-30,0,AAA,P,4496015.93,NULL,NULL,NULL,NULL,NULL,2020-11-30 
C,1234,TEST,1,AA,AAAA,2017-12-01,2018-11-30,190425.71,AAA,P,4305590.22,NULL,NULL,NULL,NULL,NULL,2020-11-30 -- Original input record
C,1234,TEST,1,AA,AAAA,2018-12-01,2019-11-30,0,AAA,P,4305590.22,NULL,NULL,NULL,NULL,NULL,2020-11-30
C,1234,TEST,1,AA,AAAA,2019-12-01,2020-11-30,0,AAA,P,4305590.22,NULL,NULL,NULL,NULL,NULL,2020-11-30

In a nut shell, what we're trying to work out here is for each reporting year up until maturity date, how many units have been used, and how many do we have remaining.

Really appreciate your help on this one Don, it's certainly a lot more complicated that we first envisaged

That seems to be an understatement. I don't see that you said anything at all about this scenario.

However, reading back through your first post, I did find that my suggestion is incorrect. It only uses COL2 as the key; not COL2, COL3, and COL4. I assume that you can easily modify the code I suggested to fix that deficiency.

Are you now saying that the start year (I assume that you noticed that the code I suggested doesn't care about the month and day values other than to determine whether or not a record needs to be created for the maturity year) is the same for every account in each file you'll be processing? Or, do you have a database somewhere that has to be consulted to find the start date for each different key (i.e., each set of COL2, COL3, and COL4 values)?

Hi Don,

Yes you're right, the key is COL2, COL3 and COL4 - Am I right in thinking that this is the bit that needs to be changed to reflect that?

$2$3$4 != last {
	# $2 has changed, add any needed entries from previous line up to and
	# including the maturity year.
	addlines(startyear + 1, endyear + 1, maturityyear - startyear)
	# Gather year and month & day from fields 7, 8 and, 18.
	split($0, fields)
	last = $2$3$4

There will be one file and the start date will be the same for each key - they will all use reporting day parameter as the start date.

For example for this record the reporting date would be the start date i.e. 2015-12-01 and the end date would be +1 year and -1 day to make sure we cover the whole year

COL1,COL2,COL3,COL4,COL5,COL6,COL7,COL8,COL9,COL10,COL11,COL12,COL13,COL14,COL15,COL16,COL17,COL18
C,1234,TEST,1,AA,AAAA,2019-12-01,2020-11-30,190425.71,AAA,P,4305590.22,NULL,NULL,NULL,NULL,NULL,2020-11-30

Therefore the output would look like;

COL1,COL2,COL3,COL4,COL5,COL6,COL7,COL8,COL9,COL10,COL11,COL12,COL13,COL14,COL15,COL16,COL17,COL18
C,1234,TEST,1,AA,AAAA,2015-12-01,2016-11-30,0,AAA,P,4496015.93,NULL,NULL,NULL,NULL,NULL,2020-11-30 -- First record with start date = reporting date
C,1234,TEST,1,AA,AAAA,2016-12-01,2017-11-30,0,AAA,P,4496015.93,NULL,NULL,NULL,NULL,NULL,2020-11-30 
C,1234,TEST,1,AA,AAAA,2017-12-01,2018-11-30,0,AAA,P,4496015.93,NULL,NULL,NULL,NULL,NULL,2020-11-30 
C,1234,TEST,1,AA,AAAA,2018-12-01,2019-11-30,0,AAA,P,4496015.93,NULL,NULL,NULL,NULL,NULL,2020-11-30
C,1234,TEST,1,AA,AAAA,2019-12-01,2020-11-30,190425.71,AAA,P,4305590.22,NULL,NULL,NULL,NULL,NULL,2020-11-30  -- Original input record

Thanks for your assistance

If the data in COL2 is the same number of characters in every record and the data in COL3 is the same number of characters in every record, just concatenating those three fields in the awk variable last would be fine. Since I didn't know if those fields are constant width, I included the field separators in the comparison. And, with a few minor modifications to what I suggested before, it seems to work with your new requirements:

#!/bin/ksh
Report_Year='2015'
awk -v reportyear="$Report_Year" '
NR == 1 {
	print
	FS = OFS = ","
	next
}
function addlines(start1, end1, count) {
	if(last == "") return
	for(i = 0; i < count; i++) {
		for(j = 1; j < 7; j++)
			printf("%s%s", fields[j], OFS)
		printf("%4d%s%s%4d%s%s0%s", start1 + i, startmd, OFS, end1 + i,
		    endmd, OFS, OFS)
		for(j = 10; j <= 18; j++)
			printf("%s%s", fields[j], (j < 18) ? OFS : ORS)
	}
}
$2 FS $3 FS $4 != last {
	# $2 has changed, add any needed entries from previous line up to and
	# including the maturity year.
	addlines(startyear + 1, endyear + 1, maturityyear - startyear)
	# Gather year, month, & day from fields 7 & 8 and year from field 16.
	split($0, fields)
	last = $2 FS $3 FS $4
	startyear = substr($7, 1, 4)
	startmd = substr($7, 5)
	endyear = substr($8, 1, 4)
	endmd = substr($8, 5)
	maturityyear = substr($18, 1, 4)
	maturitymd = substr($18, 5)
	if(startmd > maturitymd)
		maturityyear--
	if(reportyear < startyear) {
		# Add records for missing years before the current record.
		save12 = $12
		fields[12] = sprintf("%.2f", $12 + $9)
		addlines(reportyear, reportyear + endyear - startyear,
		    startyear - reportyear)
		fields[12] = save12
	}
	# Print current record.
	print
	next
}
{	# $2 has not changed since the previous line.  
	# Get new start and end years from fields 7 & 8.
	nstartyear = substr($7, 1, 4)
	nendyear = substr($8, 1, 4)
	# Add any needed entries from previous line to this line.
	addlines(startyear + 1, endyear + 1, nstartyear - startyear - 1)
	# Reset startyear, endyear, and fields[] for next line.
	startyear = nstartyear
	endyear = nendyear
	split($0, fields)
	# Print current record.
	print
}
END {	addlines(startyear + 1, endyear + 1, maturityyear - startyear)
}' file

Additions and changes to the previous script are shown in red.
If file contains:

COL1,COL2,COL3,COL4,COL5,COL6,COL7,COL8,COL9,COL10,COL11,COL12,COL13,COL14,COL15,COL16,COL17,COL18
C,1234,TEST,1,AA,AAAA,2017-12-01,2018-11-30,190425.71,AAA,P,4305590.22,NULL,NULL,NULL,NULL,NULL,2020-11-30
C,1234,TEST,2,AA,AAAA,2017-01-01,2017-12-31,190425.71,AAA,P,4305590.22,NULL,NULL,NULL,NULL,NULL,2020-12-31

it produces the output:

COL1,COL2,COL3,COL4,COL5,COL6,COL7,COL8,COL9,COL10,COL11,COL12,COL13,COL14,COL15,COL16,COL17,COL18
C,1234,TEST,1,AA,AAAA,2015-12-01,2016-11-30,0,AAA,P,4496015.93,NULL,NULL,NULL,NULL,NULL,2020-11-30
C,1234,TEST,1,AA,AAAA,2016-12-01,2017-11-30,0,AAA,P,4496015.93,NULL,NULL,NULL,NULL,NULL,2020-11-30
C,1234,TEST,1,AA,AAAA,2017-12-01,2018-11-30,190425.71,AAA,P,4305590.22,NULL,NULL,NULL,NULL,NULL,2020-11-30
C,1234,TEST,1,AA,AAAA,2018-12-01,2019-11-30,0,AAA,P,4305590.22,NULL,NULL,NULL,NULL,NULL,2020-11-30
C,1234,TEST,1,AA,AAAA,2019-12-01,2020-11-30,0,AAA,P,4305590.22,NULL,NULL,NULL,NULL,NULL,2020-11-30
C,1234,TEST,2,AA,AAAA,2015-01-01,2015-12-31,0,AAA,P,4496015.93,NULL,NULL,NULL,NULL,NULL,2020-12-31
C,1234,TEST,2,AA,AAAA,2016-01-01,2016-12-31,0,AAA,P,4496015.93,NULL,NULL,NULL,NULL,NULL,2020-12-31
C,1234,TEST,2,AA,AAAA,2017-01-01,2017-12-31,190425.71,AAA,P,4305590.22,NULL,NULL,NULL,NULL,NULL,2020-12-31
C,1234,TEST,2,AA,AAAA,2018-01-01,2018-12-31,0,AAA,P,4305590.22,NULL,NULL,NULL,NULL,NULL,2020-12-31
C,1234,TEST,2,AA,AAAA,2019-01-01,2019-12-31,0,AAA,P,4305590.22,NULL,NULL,NULL,NULL,NULL,2020-12-31
C,1234,TEST,2,AA,AAAA,2020-01-01,2020-12-31,0,AAA,P,4305590.22,NULL,NULL,NULL,NULL,NULL,2020-12-31

Which I hope is what you expect. (Although I assume that you won't have any input files where there are multiple maturity months.)

1 Like

Don,

Thank you so much for the above - working perfectly for what I need.

Thanks for helping with this one, it's been a life saver!

Hi Ads89,
I'm glad it is working for you.

There are two other changes that I should have made, but missed. If you're going to continue using this script, please change the lines containing:

	# $2 has changed, add any needed entries from previous line up to and
	# including the maturity year.
			and
{	# $2 has not changed since the previous line.

to:

	# The key has changed, add any needed entries from previous line up to
	# and including the maturity year.
			and
{	# The key has not changed since the previous line.

respectively.

Hopefully, with what you have seen here, you'll have a better chance of being able to do something like this on your own next time.