Dear all,
I have basic knowledge of Unix script and her I am trying to process variable length and variable format CSV file.
The file length will depend on the numbers of Earnings/Deductions/Direct Deposits.
And
The format will depend on whether it is Earnings/Deductions or Direct Deposits
For example, if employee has 2 earning, 2 deductions, and 2 direct deposits accounts, The record after EmployeeID will repeat twice for Earnings, repeat twice for Deductions, and repeat twice for direct deposits.
For earnings and deposits, the format are the same, they need to have earnings/deposit numbers then earnings/deposit amounts.
For direct deposits, the format is more complicated, it will require Bank Number, Account Type, Account Number, Amount,
Here is the sample data. The first line is header.
EmployeeID,Total numbers of Earnings (E), Total numbers of Deductions (D), Total numbers of Direct Deposit (DD), E/D #1 Number, E/D #1 Amount, E/D #2 Number, E/D #2 Amount,....,E/D #n Number, E/D #n Amount, DD #1 Bank Number, DD #1 Account Type, DD #1 Account Number, DD #1 Amount, DD #2 Bank Number, DD #2 Account Type, DD #2 Account Number, DD #2 Amount,......,DD #n Bank Number, DD #n Account Type, DD #n Account Number, DD #n Amount,
1,1,2,1,01,12000,13,1000,14,1000,010001001,ABC,123456,10000
2,2,2,2,01,10000,02,2000,11,500,12, 500,010001002,ABC,1212121,5000,010001003,CDE,123123,6000
For Employee #1, he/she has 1 earning, 2 deductions, and 1 direct deposit, so his/her record will have one earning (01), two deductions (13 and 14), and one direct deposit account (010001001,ABC,123456)
For Employee #2, he/she has 2 earnings, 2 deduction,s and 2 direct deposits, so his/her record will have one earning (01 and 02), two deductions (11 and 12), and two direct deposit accounts (010001002,ABC,1212121 and 010001003,CDE,123123)
Since the numbers of Earnings/Deductions or Direct Deposits could vary, the length and format of the records in CSV file are different.
I have several options.
- I can seperate the file into three files. One for Earning, one for deduction, and one for direct deposit. Each file will have employee id in the beginning of the record.
so for sample data will look like
Earning File
EmployeeID,Earning #1 Number, Earning #1 Amount,....,Earning #n Number, Earning #n Amount
1,01,12000
2,01,10000,02,2000
Deduction File
EmployeeID,Deduction #1 Number, Deduction #1 Amount,....,Deduction #n Number, Deduction #n Amount
1, 13,1000,14,1000
2,11,500,12, 500
Direct Deposit File
EmployeeID,#1 Bank Number, #1 Account Type, #1 Account Number, #1 Amount,..... #n Bank Number, #n Account Type, #n Account Number, #n Amount
1,010001001,ABC,123456,10000
2,010001002,ABC,1212121,5000,010001003,CDE,123123,6000
or
2. I can further "normalize" each record into sepeate records in each files.
Earning File
EmployeeID,Earning Number, Earning Amount
1,01,12000
2,01,10000
2,02,2000
Deduction File
EmployeeID,Deduction Number, Deduction Amount
1,13,1000
1,14,1000
2,11,500
2,12,500
Direct Deposit File
EmployeeID,Bank Number, Account Type, Account Number, Amount
1,010001001,ABC,123456,10000
2,010001002,ABC,1212121,5000,010001003,CDE,123123,6000
Thank you very much for your time to help me to resolve this issue.
chechun