Hi,
I have a requirement to format the data in a new order. Here is my source format :
ppp ***Wed Dec 16 10:32:30 GMT 2015
header1 header2 header3 header4 header5
server1 0.00 0.02 0.07 0.98
server2 0.01 0.00 0.08 0.79
server3 0.05 0.82 0.77 0.86
I need to change the above data snippet to the following format.
ppp ***Wed Dec 16 10:32:30 GMT 2015 server1-header2 server1-header3 server1-header4 server1-header5 server2-header2 server2-header3 server2-header4 server2-header5 server3-header2 server3-header3 server3-header4 server3-header5
0.00 0.02 0.07 0.98 0.01 0.00 0.08 0.79 0.05 0.82 0.77 0.86
The intent is to create multiple tables of timestamp while converting rows into columns.
I am not sure from to start, any help. I tried awk to grab the column, and not sure how to merge rows into column.
Thanks in advance.
Not trying to be unhelpful: but the answer is already at the lower left hand corner.
These are search results so you don't have to wait for someone to answer.
This is one of the most common questions, you may have to read a few answers to get one you like.
Hi Jim,
I looked at those results, but do not find any similarities.
First thing seems complex is how to merge headers, while having data mapped.
RudiC
May 1, 2017, 4:50pm
4
Try
awk '
NR == 1 {printf "%s ", $0
next
}
NR == 2 {for (i=2; i<=NF; i++) HD = $i
MXHD = NF
next
}
{MXSV = NR - 3
SV[MXSV] = $1
for (i=2; i<=NF; i++) DT[i, MXSV] = $i
}
END {for (i=1; i<=MXSV; i++)
for (j=2; j<=MXHD; j++) printf " %s-%s", SV, HD[j]
printf RS
for (i=1; i<=MXSV; i++)
for (j=2; j<=MXHD; j++) printf "%s\t", DT[j,i]
printf RS
}
' file
ppp ***Wed Dec 16 10:32:30 GMT 2015 server1-header2 server1-header3 server1-header4 server1-header5 server2-header2 server2-header3 server2-header4 server2-header5 server3-header2 server3-header3 server3-header4 server3-header5
0.00 0.02 0.07 0.98 0.01 0.00 0.08 0.79 0.05 0.82 0.77 0.86
Thanks Rudic.
This seems to work. Appreciate if you can elaborate a bit. Being a biologist I do not use awk much, and to me this look pretty complex.
RudiC
May 2, 2017, 3:20am
6
"seems to work" is not quite the expected result...
awk '
NR == 1 {printf "%s ", $0 # print time stamp
next # continue with next record
}
NR == 2 {for (i=2; i<=NF; i++) HD = $i # collect header names from second field
MXHD = NF # collect header count
next # continue with next record
}
{MXSV = NR - 3 # determine & keep maximum server No.
SV[MXSV] = $1 # collect server name
for (i=2; i<=NF; i++) DT[i, MXSV] = $i # collect server data
}
END {for (i=1; i<=MXSV; i++) # use nested loops (i & j)
for (j=2; j<=MXHD; j++) printf " %s-%s", SV, HD[j] # to print server-header combinations
printf RS # print line feed
for (i=1; i<=MXSV; i++) # use nested loops (i & j)
for (j=2; j<=MXHD; j++) printf "%s\t", DT[j,i] # to print server data
printf RS # print line feed
}
' file
I mentioned seems to work because now I am trying with real data, and I see that it misses the first header ( ie vda ).
zzz ***Wed Dec 16 10:32:30 GMT 2015
avg-cpu: %user %nice %system %iowait %steal %idle
12.23 0.00 7.45 76.06 0.00 4.26
Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util
vda 0.00 0.00 0.00 2.00 0.00 4.00 4.00 0.24 128.00 0.00 128.00 121.00 24.20
vdb 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
vdc 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
vdd 0.00 0.00 2.00 1.00 1.00 0.50 1.00 0.11 36.67 33.00 44.00 36.67 11.00
vde 0.00 0.00 2.00 1.00 1.00 0.50 1.00 0.04 14.67 0.00 44.00 14.67 4.40
vdf 0.00 0.00 2.00 1.00 1.00 0.50 1.00 0.10 31.67 25.50 44.00 31.67 9.50
vdg 0.00 0.00 43.00 23.00 400.00 97.00 15.06 3.83 57.05 62.37 47.09 15.15 100.00
vdh 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
dm-0 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
dm-1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.24 0.00 0.00 0.00 0.00 24.20
dm-2 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
vdi 0.00 0.00 39.00 23.00 320.00 78.50 12.85 1.11 17.40 2.85 42.09 12.39 76.80
RudiC
May 2, 2017, 4:13am
8
No surprise. The structure of the file above doesn't meet the one in post#1.
Yes, indeed. I planned to remove the two extra lines using sed, but that does not seem to work post removal.
Any suggestion how I can modify the code.
RudiC
May 2, 2017, 6:07am
10
Try
awk '
NR == 1 {printf "%s ", $0
next
}
!NF {getline
for (i=2; i<=NF; i++) HD = $i
MXHD = NF
START = 1
STNR = NR
next
}
START {MXSV = NR - STNR
SV[MXSV] = $1
for (i=2; i<=NF; i++) DT[i, MXSV] = $i
}
END {for (i=1; i<=MXSV; i++)
for (j=2; j<=MXHD; j++) printf " %s-%s", SV, HD[j]
printf RS
for (i=1; i<=MXSV; i++)
for (j=2; j<=MXHD; j++) printf "%s\t", DT[j,i]
printf RS
}
' file
This could fail if the "empty" line would contain spaces, i.e. is not empty.
Thanks a lot.
One last query, in my file, I would have multiple set of data like this, ie. starting with timestamp. Is it possible to loop through the data?
Again appreciate your help.
RudiC
May 2, 2017, 6:24am
12
Yes. By what logics can each new data set be identified? And, a sample input always helps.
Each new set of data will start using timestamp beginning with zzz.
zzz ***Wed Dec 16 10:32:30 GMT 2015
---------- Post updated at 11:48 PM ---------- Previous update was at 02:41 AM ----------
Hi Rudic,
Data will be of following format :
zzz ***Wed Dec 16 10:32:30 GMT 2015
avg-cpu: %user %nice %system %iowait %steal %idle
12.23 0.00 7.45 76.06 0.00 4.26
Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util
vda 0.00 0.00 0.00 2.00 0.00 4.00 4.00 0.24 128.00 0.00 128.00 121.00 24.20
vdb 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
vdc 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
vdd 0.00 0.00 2.00 1.00 1.00 0.50 1.00 0.11 36.67 33.00 44.00 36.67 11.00
vde 0.00 0.00 2.00 1.00 1.00 0.50 1.00 0.04 14.67 0.00 44.00 14.67 4.40
vdf 0.00 0.00 2.00 1.00 1.00 0.50 1.00 0.10 31.67 25.50 44.00 31.67 9.50
vdg 0.00 0.00 43.00 23.00 400.00 97.00 15.06 3.83 57.05 62.37 47.09 15.15 100.00
vdh 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
dm-0 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
dm-1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.24 0.00 0.00 0.00 0.00 24.20
dm-2 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
vdi 0.00 0.00 39.00 23.00 320.00 78.50 12.85 1.11 17.40 2.85 42.09 12.39 76.80
zzz ***Wed Dec 16 10:33:30 GMT 2015
avg-cpu: %user %nice %system %iowait %steal %idle
12.23 0.00 7.45 76.06 0.00 4.26
Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util
vda 0.00 0.00 0.00 2.00 0.00 4.00 4.00 0.24 128.00 0.00 128.00 121.00 24.20
vdb 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
vdc 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
vdd 0.00 0.00 2.00 1.00 1.00 0.50 1.00 0.11 36.67 33.00 44.00 36.67 11.00
vde 0.00 0.00 2.00 1.00 1.00 0.50 1.00 0.04 14.67 0.00 44.00 14.67 4.40
vdf 0.00 0.00 2.00 1.00 1.00 0.50 1.00 0.10 31.67 25.50 44.00 31.67 9.50
vdg 0.00 0.00 43.00 23.00 400.00 97.00 15.06 3.83 57.05 62.37 47.09 15.15 100.00
vdh 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
dm-0 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
dm-1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.24 0.00 0.00 0.00 0.00 24.20
dm-2 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
vdi 0.00 0.00 39.00 23.00 320.00 78.50 12.85 1.11 17.40 2.85 42.09 12.39 76.80
RudiC
May 3, 2017, 4:02am
14
Try
awk '
function PRT() {for (i=1; i<=MXSV; i++)
for (j=2; j<=MXHD; j++) printf " %s-%s", SV, HD[j]
printf RS
for (i=1; i<=MXSV; i++)
for (j=2; j<=MXHD; j++) printf "%s\t", DT[j,i]
printf RS
}
/^zzz/ {if (NR > 1) PRT()
split ("", HD)
split ("", SV)
split ("", DT)
START = 0
printf "%s ", $0
next
}
!NF {if (START) next
getline
for (i=2; i<=NF; i++) HD = $i
MXHD = NF
START = 1
STNR = NR
next
}
START {MXSV = NR - STNR
SV[MXSV] = $1
for (i=2; i<=NF; i++) DT[i, MXSV] = $i
}
END {PRT()
}
' file
Hi Rudic,
This looks excellent.
I have two minor changes.
I need to print header only once.
Also, there should be a new first column "Timestamp" where timestamp value will be place like this "*Wed Dec 16 10:33:30 GMT 2015" , I mean no zzz *** only timestamp.
Here is the sample format :
Timestamp vda-rrqm/s vda-wrqm/s vda-r/s vda-w/s vda-rkB/s vda-wkB/s vda-avgrq-sz vda-avgqu-sz vda-await vda-r_await vda-w_await vda-svctm vda-%util vdb-rrqm/s vdb-wrqm/s vdb-r/s vdb-w/s vdb-rkB/s vdb-wkB/s vdb-avgrq-sz vdb-avgqu-sz vdb-await vdb-r_await vdb-w_await vdb-svctm vdb-%util vdc-rrqm/s vdc-wrqm/s vdc-r/s vdc-w/s vdc-rkB/s vdc-wkB/s vdc-avgrq-sz vdc-avgqu-sz vdc-await vdc-r_await vdc-w_await vdc-svctm vdc-%util vdd-rrqm/s vdd-wrqm/s vdd-r/s vdd-w/s vdd-rkB/s vdd-wkB/s vdd-avgrq-sz vdd-avgqu-sz vdd-await vdd-r_await vdd-w_await vdd-svctm vdd-%util vde-rrqm/s vde-wrqm/s vde-r/s vde-w/s vde-rkB/s vde-wkB/s vde-avgrq-sz vde-avgqu-sz vde-await vde-r_await vde-w_await vde-svctm vde-%util vdf-rrqm/s vdf-wrqm/s vdf-r/s vdf-w/s vdf-rkB/s vdf-wkB/s vdf-avgrq-sz vdf-avgqu-sz vdf-await vdf-r_await vdf-w_await vdf-svctm vdf-%util vdg-rrqm/s vdg-wrqm/s vdg-r/s vdg-w/s vdg-rkB/s vdg-wkB/s vdg-avgrq-sz vdg-avgqu-sz vdg-await vdg-r_await vdg-w_await vdg-svctm vdg-%util vdh-rrqm/s vdh-wrqm/s vdh-r/s vdh-w/s vdh-rkB/s vdh-wkB/s vdh-avgrq-sz vdh-avgqu-sz vdh-await vdh-r_await vdh-w_await vdh-svctm vdh-%util dm-0-rrqm/s dm-0-wrqm/s dm-0-r/s dm-0-w/s dm-0-rkB/s dm-0-wkB/s dm-0-avgrq-sz dm-0-avgqu-sz dm-0-await dm-0-r_await dm-0-w_await dm-0-svctm dm-0-%util dm-1-rrqm/s dm-1-wrqm/s dm-1-r/s dm-1-w/s dm-1-rkB/s dm-1-wkB/s dm-1-avgrq-sz dm-1-avgqu-sz dm-1-await dm-1-r_await dm-1-w_await dm-1-svctm dm-1-%util dm-2-rrqm/s dm-2-wrqm/s dm-2-r/s dm-2-w/s dm-2-rkB/s dm-2-wkB/s dm-2-avgrq-sz dm-2-avgqu-sz dm-2-await dm-2-r_await dm-2-w_await dm-2-svctm dm-2-%util vdi-rrqm/s vdi-wrqm/s vdi-r/s vdi-w/s vdi-rkB/s vdi-wkB/s vdi-avgrq-sz vdi-avgqu-sz vdi-await vdi-r_await vdi-w_await vdi-svctm vdi-%util
Wed Dec 16 10:32:30 GMT 2015 0.00 0.00 0.00 2.00 0.00 4.00 4.00 0.24 128.00 0.00 128.00 121.00 24.20 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 2.00 1.00 1.00 0.50 1.00 0.11 36.67 33.00 44.00 36.67 11.00 0.00 0.00 2.00 1.00 1.00 0.50 1.00 0.04 14.67 0.00 44.00 14.67 4.40 0.00 0.00 2.00 1.00 1.00 0.50 1.00 0.10 31.67 25.50 44.00 31.67 9.50 0.00 0.00 43.00 23.00 400.00 97.00 15.06 3.83 57.05 62.37 47.09 15.15 100.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.24 0.00 0.00 0.00 0.00 24.20 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 39.00 23.00 320.00 78.50 12.85 1.11 17.40 2.85 42.09 12.39 76.80
I regret for re-work as I misunderstood the requirement.
Thanks in advance.
RudiC
May 3, 2017, 10:59am
16
How about applying what you learned so far trying to bring up a solution of your own. People in here will be glad to help when you get stuck.
Hi Rudic,
Being a biologist, I do get to work on awk very often, since this is typically a complex requirement, I thought of giving it a try.
I tried with headers part of it, but frankly not able to follow the entire code.
Appreciate your help so far.
Thanks.
RudiC
May 4, 2017, 6:54am
18
Try
awk '
function PRT() {if (!HDDONE) {printf "TimeStamp"
for (i=1; i<=MXSV; i++)
for (j=2; j<=MXHD; j++) printf "\t%s-%s", SV, HD[j]
printf RS
HDDONE = 1
}
printf "%s", TS
for (i=1; i<=MXSV; i++)
for (j=2; j<=MXHD; j++) printf "\t%s", DT[j,i]
printf RS
}
/^zzz/ {if (NR > 1) PRT()
split ("", HD)
split ("", SV)
split ("", DT)
START = 0
sub (/^zzz *\** */, "")
TS = $0
next
}
!NF {if (START) next
getline
for (i=2; i<=NF; i++) HD = $i
MXHD = NF
START = 1
STNR = NR
next
}
START {MXSV = NR - STNR
SV[MXSV] = $1
for (i=2; i<=NF; i++) DT[i, MXSV] = $i
}
END {PRT()
}
' file
TimeStamp vda-rrqm/s vda-wrqm/s vda-r/s vda-w/s vda-rkB/s vda-wkB/s vda-avgrq-sz vda-avgqu-sz ...
Wed Dec 16 10:32:30 GMT 2015 0.00 0.00 0.00 2.00 0.00 4.00 4.00 0.24 128.00 0.00 128.00 121.00 24.20 ...
Wed Dec 16 10:33:30 GMT 2015 0.00 0.00 0.00 2.00 0.00 4.00 4.00 ...
Please be aware that this is NOT rock solid, but fragile, as no serious error checking is done. For instance, the order of servers is not checked - if it changes, values in columns will be wrong. Nor is their count. The count of empty lines is not checked - it starts collecting data on every other empty line.
1 Like