Convert rows into column along with header

Hi,

I have a requirement to format the data in a new order. Here is my source format :

ppp ***Wed Dec 16 10:32:30 GMT 2015
header1 header2 header3 header4 header5

server1  0.00    0.02     0.07   0.98
server2  0.01    0.00     0.08  0.79 
server3  0.05    0.82     0.77  0.86

I need to change the above data snippet to the following format.

ppp ***Wed Dec 16 10:32:30 GMT 2015 server1-header2 server1-header3 server1-header4 server1-header5 server2-header2 server2-header3 server2-header4 server2-header5 server3-header2 server3-header3 server3-header4 server3-header5

0.00    0.02      0.07  0.98 0.01  0.00  0.08 0.79 0.05    0.82   0.77    0.86

The intent is to create multiple tables of timestamp while converting rows into columns.

I am not sure from to start, any help. I tried awk to grab the column, and not sure how to merge rows into column.

Thanks in advance.

Not trying to be unhelpful: but the answer is already at the lower left hand corner.
These are search results so you don't have to wait for someone to answer.

This is one of the most common questions, you may have to read a few answers to get one you like.

Hi Jim,

I looked at those results, but do not find any similarities.

First thing seems complex is how to merge headers, while having data mapped.

Try

awk '
NR == 1         {printf "%s ", $0
                 next
                }
NR == 2         {for (i=2; i<=NF; i++) HD = $i
                 MXHD = NF
                 next
                }

                {MXSV     = NR - 3
                 SV[MXSV] = $1
                 for (i=2; i<=NF; i++) DT[i, MXSV] = $i
                }
END             {for (i=1; i<=MXSV; i++)
                   for (j=2; j<=MXHD; j++) printf " %s-%s", SV, HD[j]
                 printf RS 
                 for (i=1; i<=MXSV; i++)
                   for (j=2; j<=MXHD; j++) printf "%s\t", DT[j,i]
                 printf RS 


                }
' file
ppp ***Wed Dec 16 10:32:30 GMT 2015  server1-header2 server1-header3 server1-header4 server1-header5 server2-header2 server2-header3 server2-header4 server2-header5 server3-header2 server3-header3 server3-header4 server3-header5
0.00    0.02    0.07    0.98    0.01    0.00    0.08    0.79    0.05    0.82    0.77    0.86    

Thanks Rudic.

This seems to work. Appreciate if you can elaborate a bit. Being a biologist I do not use awk much, and to me this look pretty complex.

"seems to work" is not quite the expected result...

awk '
NR == 1         {printf "%s ", $0                                               # print time stamp
                 next                                                           # continue with next record
                }
NR == 2         {for (i=2; i<=NF; i++) HD = $i                               # collect header names from second field
                 MXHD = NF                                                      # collect header count
                 next                                                           # continue with next record
                }
                {MXSV     = NR - 3                                              # determine & keep maximum server No.
                 SV[MXSV] = $1                                                  # collect server name
                 for (i=2; i<=NF; i++) DT[i, MXSV] = $i                         # collect server data
                }
END             {for (i=1; i<=MXSV; i++)                                        # use nested loops (i & j)
                   for (j=2; j<=MXHD; j++) printf " %s-%s", SV, HD[j]        # to print server-header combinations
                 printf RS                                                      # print line feed
                 for (i=1; i<=MXSV; i++)                                        # use nested loops (i & j)
                   for (j=2; j<=MXHD; j++) printf "%s\t", DT[j,i]               # to print server data
                 printf RS                                                      # print line feed
                }
' file

I mentioned seems to work because now I am trying with real data, and I see that it misses the first header ( ie vda ).

zzz ***Wed Dec 16 10:32:30 GMT 2015
avg-cpu:  %user   %nice %system %iowait  %steal   %idle
          12.23    0.00    7.45   76.06    0.00    4.26
 
Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
vda               0.00     0.00    0.00    2.00     0.00     4.00     4.00     0.24  128.00    0.00  128.00 121.00  24.20
vdb               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00
vdc               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00
vdd               0.00     0.00    2.00    1.00     1.00     0.50     1.00     0.11   36.67   33.00   44.00  36.67  11.00
vde               0.00     0.00    2.00    1.00     1.00     0.50     1.00     0.04   14.67    0.00   44.00  14.67   4.40
vdf               0.00     0.00    2.00    1.00     1.00     0.50     1.00     0.10   31.67   25.50   44.00  31.67   9.50
vdg               0.00     0.00   43.00   23.00   400.00    97.00    15.06     3.83   57.05   62.37   47.09  15.15 100.00
vdh               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00
dm-0              0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00
dm-1              0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.24    0.00    0.00    0.00   0.00  24.20
dm-2              0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00
vdi               0.00     0.00   39.00   23.00   320.00    78.50    12.85     1.11   17.40    2.85   42.09  12.39  76.80

No surprise. The structure of the file above doesn't meet the one in post#1.

Yes, indeed. I planned to remove the two extra lines using sed, but that does not seem to work post removal.

Any suggestion how I can modify the code.

Try

awk '
NR == 1         {printf "%s ", $0
                 next
                }
!NF             {getline
                 for (i=2; i<=NF; i++) HD = $i
                 MXHD = NF
                 START = 1 
                 STNR  = NR 
                 next
                }
START           {MXSV     = NR - STNR
                 SV[MXSV] = $1
                 for (i=2; i<=NF; i++) DT[i, MXSV] = $i
                }
END             {for (i=1; i<=MXSV; i++)
                   for (j=2; j<=MXHD; j++) printf " %s-%s", SV, HD[j]
                 printf RS 
                 for (i=1; i<=MXSV; i++)
                   for (j=2; j<=MXHD; j++) printf "%s\t", DT[j,i]
                 printf RS 
                }
' file

This could fail if the "empty" line would contain spaces, i.e. is not empty.

Thanks a lot.

One last query, in my file, I would have multiple set of data like this, ie. starting with timestamp. Is it possible to loop through the data?

Again appreciate your help.

Yes. By what logics can each new data set be identified? And, a sample input always helps.

Each new set of data will start using timestamp beginning with zzz.

zzz ***Wed Dec 16 10:32:30 GMT 2015

---------- Post updated at 11:48 PM ---------- Previous update was at 02:41 AM ----------

Hi Rudic,

Data will be of following format :


zzz ***Wed Dec 16 10:32:30 GMT 2015
avg-cpu:  %user   %nice %system %iowait  %steal   %idle
          12.23    0.00    7.45   76.06    0.00    4.26
 
Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
vda               0.00     0.00    0.00    2.00     0.00     4.00     4.00     0.24  128.00    0.00  128.00 121.00  24.20
vdb               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00
vdc               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00
vdd               0.00     0.00    2.00    1.00     1.00     0.50     1.00     0.11   36.67   33.00   44.00  36.67  11.00
vde               0.00     0.00    2.00    1.00     1.00     0.50     1.00     0.04   14.67    0.00   44.00  14.67   4.40
vdf               0.00     0.00    2.00    1.00     1.00     0.50     1.00     0.10   31.67   25.50   44.00  31.67   9.50
vdg               0.00     0.00   43.00   23.00   400.00    97.00    15.06     3.83   57.05   62.37   47.09  15.15 100.00
vdh               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00
dm-0              0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00
dm-1              0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.24    0.00    0.00    0.00   0.00  24.20
dm-2              0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00
vdi               0.00     0.00   39.00   23.00   320.00    78.50    12.85     1.11   17.40    2.85   42.09  12.39  76.80

zzz ***Wed Dec 16 10:33:30 GMT 2015
avg-cpu:  %user   %nice %system %iowait  %steal   %idle
          12.23    0.00    7.45   76.06    0.00    4.26
 
Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
vda               0.00     0.00    0.00    2.00     0.00     4.00     4.00     0.24  128.00    0.00  128.00 121.00  24.20
vdb               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00
vdc               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00
vdd               0.00     0.00    2.00    1.00     1.00     0.50     1.00     0.11   36.67   33.00   44.00  36.67  11.00
vde               0.00     0.00    2.00    1.00     1.00     0.50     1.00     0.04   14.67    0.00   44.00  14.67   4.40
vdf               0.00     0.00    2.00    1.00     1.00     0.50     1.00     0.10   31.67   25.50   44.00  31.67   9.50
vdg               0.00     0.00   43.00   23.00   400.00    97.00    15.06     3.83   57.05   62.37   47.09  15.15 100.00
vdh               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00
dm-0              0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00
dm-1              0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.24    0.00    0.00    0.00   0.00  24.20
dm-2              0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00
vdi               0.00     0.00   39.00   23.00   320.00    78.50    12.85     1.11   17.40    2.85   42.09  12.39  76.80

Try

awk '
function PRT()  {for (i=1; i<=MXSV; i++)
                   for (j=2; j<=MXHD; j++) printf " %s-%s", SV, HD[j]
                 printf RS 
                 for (i=1; i<=MXSV; i++)
                   for (j=2; j<=MXHD; j++) printf "%s\t", DT[j,i]
                 printf RS 
                    }

/^zzz/          {if (NR > 1) PRT()

                 split ("", HD)
                 split ("", SV)
                 split ("", DT)
                 START = 0

                 printf "%s ", $0
                 next
                }

!NF             {if (START) next

                 getline
                 for (i=2; i<=NF; i++) HD = $i
                 MXHD = NF
                 START = 1 
                 STNR  = NR 
                 next
                }
START           {MXSV     = NR - STNR
                 SV[MXSV] = $1
                  for (i=2; i<=NF; i++) DT[i, MXSV] = $i
                }
END             {PRT()
                }
' file

Hi Rudic,

This looks excellent.

I have two minor changes.

  1. I need to print header only once.
  2. Also, there should be a new first column "Timestamp" where timestamp value will be place like this "*Wed Dec 16 10:33:30 GMT 2015" , I mean no zzz *** only timestamp.

Here is the sample format :


Timestamp vda-rrqm/s vda-wrqm/s vda-r/s vda-w/s vda-rkB/s vda-wkB/s vda-avgrq-sz vda-avgqu-sz vda-await vda-r_await vda-w_await vda-svctm vda-%util vdb-rrqm/s vdb-wrqm/s vdb-r/s vdb-w/s vdb-rkB/s vdb-wkB/s vdb-avgrq-sz vdb-avgqu-sz vdb-await vdb-r_await vdb-w_await vdb-svctm vdb-%util vdc-rrqm/s vdc-wrqm/s vdc-r/s vdc-w/s vdc-rkB/s vdc-wkB/s vdc-avgrq-sz vdc-avgqu-sz vdc-await vdc-r_await vdc-w_await vdc-svctm vdc-%util vdd-rrqm/s vdd-wrqm/s vdd-r/s vdd-w/s vdd-rkB/s vdd-wkB/s vdd-avgrq-sz vdd-avgqu-sz vdd-await vdd-r_await vdd-w_await vdd-svctm vdd-%util vde-rrqm/s vde-wrqm/s vde-r/s vde-w/s vde-rkB/s vde-wkB/s vde-avgrq-sz vde-avgqu-sz vde-await vde-r_await vde-w_await vde-svctm vde-%util vdf-rrqm/s vdf-wrqm/s vdf-r/s vdf-w/s vdf-rkB/s vdf-wkB/s vdf-avgrq-sz vdf-avgqu-sz vdf-await vdf-r_await vdf-w_await vdf-svctm vdf-%util vdg-rrqm/s vdg-wrqm/s vdg-r/s vdg-w/s vdg-rkB/s vdg-wkB/s vdg-avgrq-sz vdg-avgqu-sz vdg-await vdg-r_await vdg-w_await vdg-svctm vdg-%util vdh-rrqm/s vdh-wrqm/s vdh-r/s vdh-w/s vdh-rkB/s vdh-wkB/s vdh-avgrq-sz vdh-avgqu-sz vdh-await vdh-r_await vdh-w_await vdh-svctm vdh-%util dm-0-rrqm/s dm-0-wrqm/s dm-0-r/s dm-0-w/s dm-0-rkB/s dm-0-wkB/s dm-0-avgrq-sz dm-0-avgqu-sz dm-0-await dm-0-r_await dm-0-w_await dm-0-svctm dm-0-%util dm-1-rrqm/s dm-1-wrqm/s dm-1-r/s dm-1-w/s dm-1-rkB/s dm-1-wkB/s dm-1-avgrq-sz dm-1-avgqu-sz dm-1-await dm-1-r_await dm-1-w_await dm-1-svctm dm-1-%util dm-2-rrqm/s dm-2-wrqm/s dm-2-r/s dm-2-w/s dm-2-rkB/s dm-2-wkB/s dm-2-avgrq-sz dm-2-avgqu-sz dm-2-await dm-2-r_await dm-2-w_await dm-2-svctm dm-2-%util vdi-rrqm/s vdi-wrqm/s vdi-r/s vdi-w/s vdi-rkB/s vdi-wkB/s vdi-avgrq-sz vdi-avgqu-sz vdi-await vdi-r_await vdi-w_await vdi-svctm vdi-%util
Wed Dec 16 10:32:30 GMT 2015 0.00    0.00    0.00    2.00    0.00    4.00    4.00    0.24    128.00  0.00    128.00  121.00  24.20   0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00  0.00     0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    2.00    1.00    1.00    0.50    1.00    0.11    36.67   33.00   44.00 36.67    11.00   0.00    0.00    2.00    1.00    1.00    0.50    1.00    0.04    14.67   0.00    44.00   14.67   4.40    0.00    0.00    2.00    1.00    1.00    0.50    1.00    0.10    31.67   25.50 44.00    31.67   9.50    0.00    0.00    43.00   23.00   400.00  97.00   15.06   3.83    57.05   62.37   47.09   15.15   100.00  0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00  0.00     0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.24  0.00     0.00    0.00    0.00    24.20   0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    39.00   23.00   320.00  78.50   12.85 1.11     17.40   2.85    42.09   12.39   76.80

I regret for re-work as I misunderstood the requirement.

Thanks in advance.

How about applying what you learned so far trying to bring up a solution of your own. People in here will be glad to help when you get stuck.

Hi Rudic,

Being a biologist, I do get to work on awk very often, since this is typically a complex requirement, I thought of giving it a try.

I tried with headers part of it, but frankly not able to follow the entire code.

Appreciate your help so far.

Thanks.

Try

awk '
function PRT()  {if (!HDDONE)   {printf "TimeStamp"
                                 for (i=1; i<=MXSV; i++)
                                   for (j=2; j<=MXHD; j++) printf "\t%s-%s", SV, HD[j]
                                 printf RS
                                 HDDONE = 1
                                }
                 printf "%s", TS
                 for (i=1; i<=MXSV; i++)
                   for (j=2; j<=MXHD; j++) printf "\t%s", DT[j,i]
                 printf RS 
                }

/^zzz/          {if (NR > 1) PRT()

                 split ("", HD)
                 split ("", SV)
                 split ("", DT)
                 START = 0

                 sub (/^zzz *\** */, "")
                 TS    = $0
                 next
                }

!NF             {if (START) next

                 getline
                 for (i=2; i<=NF; i++) HD = $i
                 MXHD = NF
                 START = 1 
                 STNR  = NR 
                 next
                }
START           {MXSV     = NR - STNR
                 SV[MXSV] = $1
                 for (i=2; i<=NF; i++) DT[i, MXSV] = $i
                }
END             {PRT()
                }
' file
TimeStamp       vda-rrqm/s      vda-wrqm/s      vda-r/s vda-w/s vda-rkB/s       vda-wkB/s       vda-avgrq-sz    vda-avgqu-sz      ...
Wed Dec 16 10:32:30 GMT 2015    0.00    0.00    0.00    2.00    0.00    4.00    4.00    0.24    128.00  0.00    128.00  121.00  24.20   ...
Wed Dec 16 10:33:30 GMT 2015    0.00    0.00    0.00    2.00    0.00    4.00    4.00  ...

Please be aware that this is NOT rock solid, but fragile, as no serious error checking is done. For instance, the order of servers is not checked - if it changes, values in columns will be wrong. Nor is their count. The count of empty lines is not checked - it starts collecting data on every other empty line.

1 Like

Thanks a lot!