Split column into rows

Hi,

I have input dataset as below:

Cl.jenn,1051,ABCD
JEN.HEA,9740|1517|8119|2145,ZZZZ,REPEAT
Rich.Sm,
Ann.Car,3972|4051|1064|4323|4122|2394|2574|4507
Sta.for,7777,ABCD,UUUU
Sm.Ric,
Ch.LRD, 
Eh.ab,
Gr.sh,

Expected output:
-------------------

Cl.jenn,1051,ABCD
JEN.HEA,9740,ZZZZ,REPEAT
JEN.HEA,1517,ZZZZ,REPEAT
JEN.HEA,8119,ZZZZ,REPEAT
JEN.HEA,2145,ZZZZ,REPEAT
Rich.Sm,
Ann.Car,3972
Ann.Car,4051
Ann.Car,1064
Ann.Car,4323
Ann.Car,4122
Ann.Car,2394
Ann.Car,2574
Ann.Car,4507
Sta.for,7777,ABCD,UUUU
Sm.Ric,
Ch.LRD,
Eh.ab,
Gr.sh,

I am using below command to achieve this:
--------------------------------------------------

awk -F',' -v OFS=',' '{n=split($2,s,"|"); for(i=1;i<=n;i++) {$2=s;print}}' test.txt

Actual output is :
-------------------

Cl.jenn,1051,ABCD
JEN.HEA,9740,ZZZZ,REPEAT
JEN.HEA,1517,ZZZZ,REPEAT
JEN.HEA,8119,ZZZZ,REPEAT
JEN.HEA,2145,ZZZZ,REPEAT
Ann.Car,3972
Ann.Car,4051
Ann.Car,1064
Ann.Car,4323
Ann.Car,4122
Ann.Car,2394
Ann.Car,2574
Ann.Car,4507
Sta.for,7777,ABCD,UUUU
Ch.LRD,

Some records are filtered to display. Any suggestions pls???

Splitting a zero length field produces an array with no elements. I would guess that the lines that are printing data in what look like empty field 2 values actually contain one or more <space>s or <tab>s while the lines that have been "filtered" actually have no characters in the 2nd field. Try:

awk -F',' -v OFS=',' '{n=split($2,s,"|"); if(n)for(i=1;i<=n;i++) {$2=s;print} else print}' test.txt

or, preferably (and more readable):

awk -F',' -v OFS=',' '
{	n = split($2, s, "|")
	if(n)	for(i = 1; i <= n; i++) {
			$2 = s
			print
		}
	else	print
}' test.txt
1 Like

Awesomely working...Great Sir!!!Thank you...
By the way, do you have any reference on how to read double-quoted data in AWK??
Like for example, above scenario

"Ric.Sm","1234|1234|1234","UUUU"
"Sm.Ric","4564","UUUU"

What do you want for output? With quotes or without quotes?

1 Like

Would like the Output with double quotes...

Maybe something like:

awk -F',' -v dq='"' -v OFS=',' '
{	f2 = $2
	gsub(/"/, "", f2)
	n = split(f2, s, "|")
	if(n)	for(i = 1; i <= n; i++) {
			$2 = dq s dq
			print
		}
	else	print
}' test.txt
1 Like

Thank you Don...:b: