Coverting multiple lines to a single line

Bobby_2000 · September 18, 2015, 6:51am

Hi all,

I have a requirement to covert multiple lines in a comma delimited file to a single line through shell scripting. We should compare the data in the first column in each line. If it is same, then the other data should be put in the same line.Below is the sample input and expected output:

Input:
 
ABC,AD,AS,ER
ABC,YT,YU,ER
ABC,GT
BVT,SD,WER,YUI,GHY,TRE
BVT,G,DRT,FE,GHW
MN,FR,ER,YU,WS,FH,YU,IO,DER
ABC,SE

Output:
 
ABC,AD,AS,ER,YT,YU,ER,GT
BVT,SD,WER,YUI,GHY,TRE,G,DRT,FE,GHW
MN,FR,ER,YU,WS,FH,YU,IO,DER
ABC,SE

I tried this using awk but was successful to convert a single line to multiple line and not visa versa.

Please help.

RavinderSingh13 · September 18, 2015, 7:07am

Hello Bobby,

If you are saying you need column 1st as an index for output then considering that in you shown output 2 times like ABC,AD,AS,ER,YT,YU,ER,GT and ABC,SE is typo. Following may help you in same then, if order like Input_file doesn't matter for you.

awk -F, '{for(i=2;i<=NF;i++){A[$1]=A[$1]?A[$1] FS $i:$i}} END{for(i in A){print i FS A}}' Input_file

Output will be as follows.

BVT,SD,WER,YUI,GHY,TRE,G,DRT,FE,GHW
ABC,AD,AS,ER,YT,YU,ER,GT,SE
MN,FR,ER,YU,WS,FH,YU,IO,DER

Thanks,
R. Singh

RudiC · September 18, 2015, 9:03am

If you need to stick to the sequence of occurrences of values in column 1, try

awk -F, '
$1 in RES       {X=$1
                 sub (X FS, "")
                 RES[X]=RES[X] FS $0
                 next
                }
                {RES[$1]=$0
                 SEQ[++CNT]=$1
                }
END             {for (i=1; i<=CNT; i++) print RES[SEQ]}
' file
ABC,AD,AS,ER,YT,YU,ER,GT,SE
BVT,SD,WER,YUI,GHY,TRE,G,DRT,FE,GHW
MN,FR,ER,YU,WS,FH,YU,IO,DER

---------- Post updated at 15:03 ---------- Previous update was at 14:54 ----------

or, slightly simplified,

awk -F, '
!($1 in RES)    {SEQ[++CNT]=$1
                }
                {X=$1
                 sub (X FS, "")
                 RES[X]=RES[X] FS $0
                }
END             {for (i=1; i<=CNT; i++) print SEQ RES[SEQ]}
' file

Bobby_2000 · September 20, 2015, 8:46am

Ravindersingh,

The sample output given by me is correct.We shouldn't sort or reorder the records on the first column.The comparison of first field should happen between two consecutive lines only.

Hope this is clear.

Regards,
bobby

---------- Post updated at 01:46 PM ---------- Previous update was at 01:42 PM ----------

Hi rudi,

Thanks for this.But the records are getting disordered in the output.

ABC,SE

is getting appended with the

ABC,AD,AS,ER,YT,YU,ER,GT

line.

Thanks,
Bobby

RudiC · September 20, 2015, 9:01am

Wasn't too clear in the spec. Try

awk -F, '
RES ~ "^" $1 FS {X=$1
                 sub (X FS, "")
                 RES=RES FS $0
                 next
                }
                {if (NR > 1) print RES
                 RES=$0
                }
END             {print RES}
' file
ABC,AD,AS,ER,YT,YU,ER,GT
BVT,SD,WER,YUI,GHY,TRE,G,DRT,FE,GHW
MN,FR,ER,YU,WS,FH,YU,IO,DER
ABC,SE