Comma separated to rows based on field

aec · April 22, 2013, 2:12pm

Hi to all,

I have a file like:

chr1 a1 a2 a3 a4 a5 a6,a7,a8,a9
chr1 b1 b2 b3 b4 b5 b6,b7
chr2 c1 c2 c3 c4 c5 c6,c7,c8,c9,c10
...

I would like an output like this:

chr1 a6
chr1 a7
chr1 a8
chr1 a9
chr1 b6
chr1 b7
chr2 c6
chr2 c7
chr2 c8
chr2 c9
chr2 10
...

Based on field1, split the comma separated values in different rows.
Thanks,
Anna

hanson44 · April 22, 2013, 2:21pm

What happened to chr1 a1 ?

I don't see any commas.

zozoo · April 22, 2013, 2:31pm

Hi Hanson44 i think he wants some thing like this from his input, commas separated part i have made red colored ones just to make more visibility

chr1 a1 a2 a3 a4 a5 a6,a7,a8,a9
chr1 b1 b2 b3 b4 b5 b6,b7
chr2 c1 c2 c3 c4 c5 c6,c7,c8,c9,c10

and he wants output like this


chr1 a6
chr1 a7
chr1 a8
chr1 a9
chr1 b6
chr1 b7
chr2 c6
chr2 c7
chr2 c8
chr2 c9
chr2 10
...

Yoda · April 22, 2013, 2:34pm

An awk solution:

awk '{ n=split($NF,A,","); while (++i<=n) { print $1, A } i=0 }' file

zozoo · April 22, 2013, 2:38pm

Hi yoda,

can you please explain your code so that it would be great learning for me.

Thanks:)

Scrutinizer · April 22, 2013, 2:44pm

Another one:

awk '{gsub(/,/,RS $1 FS,$NF); print $1,$NF}' file

Yoda · April 22, 2013, 2:45pm

Sure, by the way this code works only if the last field is separated by comma and there are no spaces in between them:

awk '
        {
                n = split ($NF, A, ",")         # Split last field using field separator comma "," Get number of elements created in variable: n
                while ( ++i <=n )               # while ++i <= n
                {
                        print $1, A          # Print first field and element in array: A indexed by variable: i
                }
                i = 0                           # Reset variable: i value to 0
        }
' file