Hi,
I have a requirement that has 50-60 million records that we need to split a delimited string (Delimeter is newline) into rows.
Source Date:
SerialID UnidID GENRE
100 A11 AAAchar(10)BBB
200 B11 CCCchar(10)DDD(10)ZZZZ
Field 'GENRE' is a string with new line as delimeter and not sure how many it may have?
Please advise!
Thanks
RudiC
2
Please use code tags as required by forun rules!
I guess this is from MS EXCEL where <NL> (0x0A, \n) is used as a marker to split strings into rows within a cell?
Taking your sample into *nix makes it look like
SerialID UnidID GENRE
100 A11 AAA
BBB
200 B11 CCC
DDD
ZZZZ
WHAT exactly do you want to split into rows?
Hi,
I have a requirement that has 50-60 million records that we need to split a delimited string (Delimeter is newline) into rows.
Source Data
SerialID UnidID GENRE
100 A11 AAAchar(10)BBB
200 B11 CCCchar(10)DDD(10)ZZZZ
Expected Output
SerialID UnidID GENRE
100 A11 AAA
100 A11 BBB
200 B11 CCC
200 B11 DDD
200 B11 ZZZZ
Field 'GENRE' is a string with new line as delimeter and not sure how many it may have?
Please advise!
Thanks
RudiC
4
Are you sure the input data looks like you posted, and, if yes, are you sure you're on *nix?
Field Genre can have any number of values separated by a newline delimeter.
RudiC
6
As I said, newline has a special meaning on *nix.
Given my suspicion (see post#2) is true, try:
awk 'NR==1 {print; next} NF==3 {TMP=$1 OFS $2} {print TMP OFS $NF}' file3
SerialID UnidID GENRE
100 A11 AAA
100 A11 BBB
200 B11 CCC
200 B11 DDD
200 B11 ZZZZ