Transpose large data in UNIX

Hi

I have the following sample of data: my full data dimention is 900,000* 1119

rs987435        C       G       1       1       1       0       2
rs345783        C       G       0       0       1       0       0
rs955894        G       T       1       1       2       2       1
rs6088791       A       G       1       2       0       0       1
rs11180435      C       T       1       0       1       1       1
rs17571465      A       T       1       2       2       2       2
rs17011450      C       T       2       2       2       2       2
rs6919430       A       C       2       1       2       2       2
rs2342723       C       T       0       2       0       0       0
rs11992567      C       T       2       2       2       2       2

I would like to transpose this data so that the data looks like the following

rs345783    rs955894    
C                  G
G                  T 
1                   0
1                   0
1                   1
0                   0
2                  0

and so one

I appreciate your help

I don't understand why the 1st row of your input isn't wanted as the first column of your output, why you only want to columns of output from ten rows of input, nor what you mean by "and so one". Are you trying to transpose pairs of rows or are you trying to transpose the entire table?

What operating system are you using? Different systems have different limitations that might cause you problems.

Have you searched this forum for ways to transpose data (for example, looking at the 1st three related threads listed at the bottom of this page)?

What have you tried to solve this problem on your own? Where are you getting stuck? (Is it just because of excessive line lengths on input lines that are over 1.8Mb long, or are there other problems?)

Im sorry there was a typo in the columns , I want to transpose the data I just wrote two columns as an example for the output of course it will have more columns.

I'm using terminal in mac , I don't know how to do it that is why I don't have a code for it

As I suggested before, please look at the 1st three threads under the heading below More UNIX and Linux Forum Topics You Might Find Helpful and see if the suggestions provided there helps you do what you're trying to do.

If none of the suggestions provided there help you do what you're trying to do, please show us what you have tried and explain to us what is not working when you use those suggestions.

Hi,
An example with perl (But completely loads the file into memory) :

$ perl -e 'while (<>){my $i;map {$t_a[$i++] .="$_;";} split;};print map { s/;*$/\n/ ; $_} @t_a;' file
rs987435;rs345783;rs955894;rs6088791;rs11180435;rs17571465;rs17011450;rs6919430;rs2342723;rs11992567
C;C;G;A;C;A;C;A;C;C
G;G;T;G;T;T;T;C;T;T
1;0;1;1;1;1;2;2;0;2
1;0;1;2;0;2;2;1;2;2
1;1;2;0;1;2;2;2;0;2
0;0;2;0;1;2;2;2;0;2
2;0;1;1;1;2;2;2;0;2

Regards.

Hi.

Numerous solutions in many languages/applications at:

parsing - An efficient way to transpose a file in Bash - Stack Overflow

How do I efficiently transpose a matrix in R? - Stack Overflow

Best wishes ... cheers, drl

Here is a simple awk approach the I did not find in the suggested threads..:

awk '{for(i=1; i<=NF; i++) A=A (NR>1?OFS:x) $i} END{for(i=1; i<=NR; i++) print A}' OFS=\;  file

@Scrutinizer: That works fine for quadratic matrices, but would run into difficulties if row count and column count differ. If column count is greater, you'll have empty lines at result end. If it's less, you'll miss last lines.
In the END section, you'll need to print max(NF) lines, not NR lines. Check this small amended version of your above script:

awk 'NF > MX {MX = NF}; {for(i=1; i<=NF; i++) A=A (NR>1?OFS:x) $i} END{for(i=1; i<=MX; i++) print A}' OFS="\t"  file
1 Like