Row blocks to column blocks

yifangt · May 14, 2013, 1:01pm

Hello,
Searched for a while and found some "line-to-column" script. My case is similar but with multiple fields each row:

S02     Length    Per
S02     7043      3.864
S02     54477    29.89
S02     104841    57.52
S03     Length    Per
S03     1150    0.835
S03     1321    0.96
S03     6432    4.675
S03     6922    5.031
S03     11705    8.507
S03     13793    10.02
S03     34275    24.91
S03     38238    27.79
S04    Length    Per
S04     11398    8.887
S04     17396    13.56
S04     94014    73.3

And I want the output as:

S02    Length    Per      S03    Length    Per   S04  Length    Per
S02    7043      3.864    S03    1150    0.835   S04 11398    8.887
S02    54477     29.89    S03    1321    0.96    S04 17396    13.56
S02    104841    57.52    S03    6432    4.675   S04 94014    73.3
                          S03    6922    5.031            
                          S03    11705    8.507            
                          S03    13793    10.02            
                          S03    34275    24.91            
                          S03    38238    27.79

What I tried is to cut each block (by field 1) to separate file and then use "paste" to combine the individual files. I feel there must be an easier way with awk.
Any help will be appreciated.

Jotne · May 14, 2013, 2:31pm

It will not be simple since the column are not equal size.
If its not to large, I would do cut and past in excel
Can I ask why you need this output?

yifangt · May 14, 2013, 2:55pm

Thanks Jotne!
Two reasons for this output.
1) I need this output for other statistics like min, max, mean, 5number etc of each block to compare the other 96 blocks in the output format;
2) There are many posts about "line-to-column" script in awk to do the similar conversion, I thought my case may be resolved with awk too, but could not figure out by myself.
I am aware of the line numbers are now equal for each block, so that it is more challenging to me.
Does this make any sense? Please give me any suggestion or comments. Thanks a lot!

MadeInGermany · May 14, 2013, 5:36pm

awk '{print >$1".tmp"}' file
paste *.tmp | expand -30

Yoda · May 14, 2013, 5:40pm

Here is an awk code that might help:

awk '
        NR == FNR {
                if (!($1 in C))
                {
                        C[$1]
                        V[++k] = $1
                }
                next
        }
        p != $1 {
                c = 0
        }
        {
                A[$1","++c] = $0
                p = $1
                if ( m < c )
                        m = c
        }
        END {
                for ( i = 1; i <= m; i++ )
                {
                        for ( j = 1; j <= k; j++ )
                        {
                                printf "%s\t", A[V[j]","i] ? A[V[j]","i] : "\t\t"
                        }
                        printf "\n"
                }
        }
' file file

yifangt · May 14, 2013, 9:05pm

Thanks MIG and Yoda!
MIG, your code is what I was looking for.
How about if I have same header for each block as "ID Length Per"?

ID    Length    Per 
1     7043      3.864 
2     54477    29.89 
3     104841    57.52 
ID     Length    Per 
1     1150    0.835 
2     1321    0.96 
3     6432    4.675 
4     6922    5.031 
5     11705    8.507 
6     13793    10.02 
7     34275    24.91 
8     38238    27.79 
ID    Length    Per 
1     11398    8.887 
2     17396    13.56 
3     94014    73.3

which is the original data structure. I first struggled with RS, but could not figure out the way. Then I added the first field as the group ID.
Is it possible to use the RS to handle the original situation? Thanks!

shamrock · May 17, 2013, 1:41pm

In that case try the following awk snippet...

awk '/^Id/ {s = $0; l = length() + 6; ws = sprintf("%*s", l, " ")}
    !/^Id/ {
      a[$1]++
      max = (max > a[$1] ? max : a[$1])
      if ($1 != prev) b[++j] = $1
      c[$1,a[$1]] = $0
      prev = $1
} END {
      for (i = 1; i <=j; i++) {
          t = sprintf("%-*s", l, s)
          h = (h ? h "" t : t)
      }
      printf("%s\n", h)
      for (k = 1; k <= max; k++) {
          for (i = 1; i <= j; i++) {
              if ((f = c[b,k])) {
                 m = l - length(f)
                 r = sprintf("%*s", m, " ")
                 str = sprintf("%s", str ? str "" f "" r : f "" r)
              }
              else str = sprintf("%*s", l, str ? str "" ws : ws)
          }
          printf("%s\n", str) 
          str = ""
      }
}' file

where file looks like...

Id     Length    Per
S02    7043      3.864
S02    54477     29.89
S02    104841    57.52
Id     Length    Per
S03    1150      0.835
S03    1321      0.96
S03    6432      4.675
S03    6922      5.031
S03    11705     8.507
S03    13793     10.02
S03    34275     24.91
S03    38238     27.79
Id     Length    Per
S04    11398     8.887
S04    17396     13.56
S04    94014     73.3

yifangt · May 17, 2013, 2:17pm

Thanks Shamrock!
This is really complicated, and I probably need use MadeIGermany's answer which make the parsing easier for me. Again, thank you guys for all your input.

MadeInGermany · May 17, 2013, 2:37pm

If there is not a common column like the $1 before, then awk must generate the filenames:

awk '$1=="ID" {count++} {print >count".tmp"}' file
paste *.tmp | expand -30

Whenever $1 is ID the count increases. So the filenames are 1.tmp 2.tmp etc.

If you prefer you can write the awk code as

awk '{if ($1=="ID") {count++}; print >count".tmp"}' file

yifangt · May 17, 2013, 4:08pm

I never thought of

print >count".tmp"

to concatenate the output to a file! Amazing trick!
Only when I saw your code did I realized your trick to make use of the functions of awk. Can you recommend any good reference book(s) about awk, besides the GNU manual,and the classic SED &AWK? Thanks a lot!