I need some logic that would help to group up some records that fall between two dates:
Input Data
COL_1 COL_2 COL_3 COL_4 COL_5 COL_6 COL_7 COL_8 COL_9 COL_10 COL_11 COL_12
C ABC ABCD 3 ZZ WLOA 2015-12-01 2015-12-15 975.73 ZZZ P 147018.64
C ABC ABCD 3 ZZ WLOA 2015-12-01 2016-01-31 898.86 ZZZ P 129018.66
C ABC ABCD 3 ZZ WLOA 2015-12-01 2016-02-29 788.81 ZZZ P 110912.18
C ABC ABCD 3 ZZ WLOA 2016-12-10 2017-02-29 18106.48 ZZZ P 110912.18
C ABC ABCD 3 ZZ WLOA 2016-12-10 2017-05-31 1652.2 ZZZ P 55947.43
C ABC ABCD 3 ZZ WLOA 2016-12-10 2017-08-31 650.05 ZZZ P 45500.00
C ABC ABCD 3 ZZ WLOA 2016-12-10 2017-09-20 500.15 ZZZ P 37525.00
C ABC ABCD 3 ZZ WLOA 2016-12-10 2017-10-01 357.05 ZZZ P 12385.00
We will pass VAR_DATE as a parameter to the script, which we will then use to work out the grouping from this value.
For each value in COL_4 (Grouping Column) we need to group up the records that are within a certain date range.
In the above example therefore, we would like to group records with the below logic:
COL_7 > VAR_DATE AND COL_8 <= VAR_DATE + 1 Year
Example Output (Year 1)
C ABC ABCD 3 ZZ WLOA VAR_DATE VAR_DATE + 1 YEAR SUM ALL VALUES ZZZ P FINAL RECORD IN COL_12 BALANCE FOR GIVEN ID (COL_4)
C ABC ABCD 3 ZZ WLOA 2015-12-01 2016-12-01 2663.40 ZZZ P 110912.18
We then need to group up the values for the next year (Year 2)
We would then like to repeat the same, but for year 2, using the below logic:
COL_7 > VAR_DATE + 1 Year AND COL_8 <= VAR_DATE + 2 Year
Example Output (Year 2)
C ABC ABCD 3 ZZ WLOA VAR_DATE + 1 YEAR VAR_DATE + 2 YEAR SUM ALL VALUES ZZZ P FINAL RECORD IN COL_12 BALANCE FOR GIVEN ID (COL_4)
C ABC ABCD 3 ZZ WLOA 2016-12-01 2017-12-01 21265.93 ZZZ P 12385.00
We will need to keep doing this logic up to 5 years / groups but this amount of years could change so ideally the loop/amount of groups required needs to be dynamic/parameterised.
The final output would look like below, with the two new records generated from the above logic, appended to the end of the file:
COL_1 COL_2 COL_3 COL_4 COL_5 COL_6 COL_7 COL_8 COL_9 COL_10 COL_11 COL_12
C ABC ABCD 3 ZZ WLOA 2015-12-01 2015-12-15 975.73 ZZZ P 147018.64
C ABC ABCD 3 ZZ WLOA 2015-12-01 2016-01-31 898.86 ZZZ P 129018.66
C ABC ABCD 3 ZZ WLOA 2015-12-01 2016-02-29 788.81 ZZZ P 110912.18
C ABC ABCD 3 ZZ WLOA 2016-12-10 2017-02-29 18106.48 ZZZ P 110912.18
C ABC ABCD 3 ZZ WLOA 2016-12-10 2017-05-31 1652.2 ZZZ P 55947.43
C ABC ABCD 3 ZZ WLOA 2016-12-10 2017-08-31 650.05 ZZZ P 45500.00
C ABC ABCD 3 ZZ WLOA 2016-12-10 2017-09-20 500.15 ZZZ P 37525.00
C ABC ABCD 3 ZZ WLOA 2016-12-10 2017-10-01 357.05 ZZZ P 12385.00
C ABC ABCD 3 ZZ WLOA 2015-12-01 2016-12-01 2663.40 ZZZ P 110912.18 ** (New record from above logic - Year 1 ) **
C ABC ABCD 3 ZZ WLOA 2016-12-01 2017-12-01 21265.93 ZZZ P 12385.00 ** (New record from above logic - Year 2 ) **
As the above logic will be running over a large amount of records, I would assume AWK will be the most efficient solution to the above, however my experience of AWK is extremely limited, therefore I am unsure as to how to proceed with starting the above logic.