convert columns into rows with respect to first column

mira · April 20, 2011, 4:55pm

Hello All,

Please help me with this file.

My input file (Tab separated) is like:

 
Abc-01    pc1    -0.69
Abc-01    E2cR    0.459666666666667
Abc-01    5ez.2   1.2265625
Xyz-01    pc1     -0.153
Xyz-01    E2cR    1.7358
Xyz-01    5ez.2   2.0254
Ced-02    pc1    -0.5714
Ced-02    E2cR    0.815
Ced-02    5ez.2  -0.94222222
Tac-01    pc1     4.5556677
Tac-01    E2cR   0.74532
Tac-01    5ez.2  -0.65432233

where order of second col repeats itself when the id(Col 1) changes.

I want an output file (Tab separated) like:

 
            pc1      E2cR      5ez.2
Abc-01    -0.69    0.459666666666667     1.2265625
Xyz-01    -0.153   1.7358      2.0254
Ced-02    -0.5714    0.815     -0.94222222
Tac-01    4.5556677     0.74532     -0.65432233

where each row present all the data for each Id(Col 1).

I'll be really thankful for your help.

DGPickett · April 20, 2011, 5:00pm

Select UNION Select?

vgersh99 · April 20, 2011, 5:05pm

Assuming the values of the second field repeat in the same order as $1 changes...

nawk '{a[$1]=($1 in a)?a[$1] OFS $3:$3}END{for(i in a) print i,a}' OFS='\t' myFile

mira · April 21, 2011, 4:12pm

Thanks! vgersh for your post, its working fine but it is not displaying the new headings of the columns which are the values of second col of the input file.

I'll be grateful to you if you can explain " a[$1]=($1 in a)?a[$1] OFS $3:$3 " this part of the code. Thanks again.

vgersh99 · April 21, 2011, 5:12pm

adjust the 'w=15' (width of the column) to your liking/needs.

nawk -v w=15 '{a[$1]=($1 in a)?a[$1] OFS sprintf("%-*s",w,$3):sprintf("%-*s",w,$3);h[$2]}END{for(i in h) printf("\t%-*s", w,i);print "";for(i in a) print i,a}' OFS='\t' myFile

mirni · April 22, 2011, 7:02pm

a[$1]=($1 in a)?a[$1] OFS $3:$3

checks whether first field ($1) is present in array 'a'; if yes, it appends the third column to it; if not, assign third column.
It's a ternary operator, just like in C, combined with assignment:

a=(condition)?pass:fail

that can be re-written like this:

if ($1 in a) 
  a[$1]=a[$1] OFS $3
else
  a[$1]=$3

mira · April 24, 2011, 6:05am

Thanks a lot mirni for such a nice explanation. If I have understood clearly then first it checks if $1 is there or not, in a way to check when it reaches end of file. But, I am not clear what is the purpose of assigning third column to $1?

and also its not clear to me which part of the code actually checks when the value of $1 changes?

I'll really thankful if you or vgersh can explain this.

Thanks! again.

mirni · April 25, 2011, 6:09am

This is a filter, so it processes one line at a time. Let's take the first line of input:

Abc-01    pc1    -0.69

We make an assignment

a[$1]=$3; e.i. a["Abc-01"]=-0.69

Next line of input:

Abc-01    E2cR    0.459666666666667

Is $1 in array a? Yes, we do have a["Abc-01"] defined. Append 3rd column. Now we have

a["Abc-01"]=-0.69       0.459666666666667

Next line of input is gonna make a["Abc-01"] grow into '-0.69\t0.459666666666667\t1.2265625', which is the desired output.
Next line will set array element a["Xyz-01"], etc.
You are constructing array 'a' so that the elements of it are your desired output; each array element has one line of output stored.
After you have all lines processed, you have your array 'a' populated, and you are ready to print it; easy loop (although in random order!) will print what's stored in 'a'.