Splitting delimited string into rows

Hi,

I have a requirement that has 50-60 million records that we need to split a delimited string (Delimeter is newline) into rows.

Source Date:

SerialID UnidID GENRE
100 A11 AAAchar(10)BBB
200 B11 CCCchar(10)DDD(10)ZZZZ

Field 'GENRE' is a string with new line as delimeter and not sure how many it may have?

Please advise!

Thanks

Please use code tags as required by forun rules!

I guess this is from MS EXCEL where <NL> (0x0A, \n) is used as a marker to split strings into rows within a cell?

Taking your sample into *nix makes it look like

SerialID UnidID GENRE
100 A11 AAA           
BBB
200 B11 CCC           
DDD 
ZZZZ

WHAT exactly do you want to split into rows?

Hi,

I have a requirement that has 50-60 million records that we need to split a delimited string (Delimeter is newline) into rows.

Source Data

SerialID UnidID GENRE
100 A11 AAAchar(10)BBB
200 B11 CCCchar(10)DDD(10)ZZZZ

Expected Output

SerialID UnidID GENRE
100 A11 AAA
100 A11 BBB
200 B11 CCC
200 B11 DDD
200 B11 ZZZZ

Field 'GENRE' is a string with new line as delimeter and not sure how many it may have?

Please advise!

Thanks

Are you sure the input data looks like you posted, and, if yes, are you sure you're on *nix?

Field Genre can have any number of values separated by a newline delimeter.

As I said, newline has a special meaning on *nix.

Given my suspicion (see post#2) is true, try:

awk 'NR==1 {print; next} NF==3 {TMP=$1 OFS $2} {print TMP OFS $NF}' file3
SerialID UnidID GENRE
100 A11 AAA
100 A11 BBB
200 B11 CCC
200 B11 DDD
200 B11 ZZZZ