Hi I have a very large csv file with some hundreds of thousands of rows of data.
The data is in the following format:
Up to four alpha numeric characters for the first word. This is either set as 2 characters followed by 2 spaces, or as a single 4character word. This is then followed by an 8 character alphanumeric word making a total of 12 characters in total. The number of characters and positioning is the same for every line. ie, 4 and 8.
"AB 324578DE"
"ABCDAAA87958"
"ZF 4R85T79B"
I need a way to reformat the code by taking the first 4 characters and then padding them to the end of the field as an 18 character field. So there will be 6 spaces in between the 8 character word and the original first 4 characters. ie. 8, 6, and 4.
For example the data above will become:
"324578DE AB "
"AAA87958 ABCD"
"4R85T79B ZF "
This will then be prefixed with a static text of "CODE," and then appended with 18 commas to make it a proper csv file ready for database import.
See final result below(minus the double quotes):
"CODE,324578DE AB ,,,,,,,,,,,,,,,,,,"
"CODE,AAA87958 ABCD,,,,,,,,,,,,,,,,,,"
"CODE,4R85T79B ZF ,,,,,,,,,,,,,,,,,,"
I can do the last step of prefixing and appending by using the following code where input.csv will be the results of the first text swap and padding:
sed 's/^.*/CODE,&/' input.csv | sed -e 's/$/,,,,,,,,,,,,,,,,,,/' > output.csv
however I am stuggling to perform the first step of the text swapping and padding. I was thinking of the following logic to get this to work:
- Append 6 blank spaces to end of field.
- Copy first 4 characters and append to end of field.
- Delete first 4 characters.
I can't seem to get step 2 though.
I would appreciate any hints on getting this whole thing working.