Hello dear friends, I have Lpar AIX 6.1 and there is DB2 installed. I have 8 Virtual CPU's configured on my Lpar and when I run nmon the CPU waiting time is always big.. I will provide screenshot for better realizing.. my question is what may produce so big waiting time?? Thanks in advance!
Maybe The wait process means that CPU is IDLE and waiting for other jobs?
Have you checked what processes are running currently? There can be several different reasons for that. Do you have a disk bottleneck (iostat) or network? Does DB2 have some internal trouble like locks that can't be resolved etc.?
Maybe this is interessting for you too:
Demystifying I/O Wait:
http://unix.derkeiler.com/Mailing-Lists/AIX-L/2004-04/att-0096/Demystifying_IOWAIT.pdf
---------- Post updated at 12:21 PM ---------- Previous update was at 12:16 PM ----------
Just saw your assumption - you can easily check if this is just the wait from the kernel wait processes if you run vmstat -w 1 10 for example. If the I/O Wait column is zero, then it is the wait processes.
I am not that familiar with that nmon view of the CPUs but I think that wait processes are not displayed in there, just real I/O wait like in vmstat for example.
vmstat -w 1 10
System configuration: lcpu=16 mem=24576MB ent=4.00
kthr memory page faults cpu
------- ------------------------------------ ------------------ -----------------------
r b avm fre re pi po fr sr cy in sy cs us sy id wa pc ec
2 0 5235358 868064 0 0 0 0 0 0 2999 10933 6223 5 3 88 4 0.40 10.1
0 0 5235359 868063 0 0 0 0 0 0 2629 9390 5547 4 3 89 4 0.34 8.6
0 0 5235357 868065 0 0 0 0 0 0 2547 13373 5302 5 4 88 4 0.40 10.0
0 0 5235357 868065 0 0 0 0 0 0 3038 11937 6316 6 3 88 4 0.43 10.7
0 0 5235357 868064 0 0 0 0 0 0 2421 9710 5455 4 3 89 4 0.35 8.8
0 0 5235358 868063 0 0 0 0 0 0 2604 9692 5381 4 3 89 4 0.33 8.3
0 0 5235358 868063 0 0 0 0 0 0 3722 11819 7843 5 4 87 4 0.45 11.3
1 0 5235356 868065 0 0 0 0 0 0 3055 12577 6214 5 3 88 3 0.43 10.6
0 0 5235357 868064 0 0 0 0 0 0 3354 20234 6494 6 4 86 3 0.51 12.8
0 0 5235357 868064 0 0 0 0 0 0 2444 8784 5198 4 3 89 4 0.32 8.1
and here is iostat and I guess the only problem is that disks are heavily loaded
Disks: % tm_act Kbps tps Kb_read Kb_wrtn
hdisk3 99.8 17080.0 897.1 75544 95384
hdisk2 94.6 16653.1 905.1 76600 90056
hdisk4 20.2 880.5 152.6 16 8796
hdisk0 1.4 28.4 2.8 4 280
hdisk1 1.3 28.0 2.7 0 280
Yes there is some traffic on hdisk2 and hdisk3 having a high busy time and lots of transactions while using some bandwith.
Else CPU and memory wise the box looks good so that "wwwww"s in nmon will be the kernel wait processes, nothing to worry.
If you experience any slow behaviour though, you might want to think if it there is any improvement of the disk layout possible, ie. splitting tablespaces/containers to more than 2 volumes. Maybe also db2expln shows that there can be more efficiency by implementing an index at the right place etc.
I guess you have some SAN storage behind this with a cache - if this is the case, you might want to check if those SAN disks are ok with that load or not.
A vmstat -v might be interessting too in that case.
Yes these heavily loaded disks are located on the DS8000 storadge.. but I have no good understanding how to work with this storadge and therefore I cant check anything on that storadge
and here is vmstat -v result and i want to thank you for your help...
vmstat -v
6291456 memory pages
6091921 lruable pages
733113 free pages
3 memory pools
483310 pinned pages
80.0 maxpin percentage
3.0 minperm percentage
90.0 maxperm percentage
4.5 numperm percentage
278340 file pages
0.0 compressed percentage
0 compressed pages
4.5 numclient percentage
90.0 maxclient percentage
278340 client pages
0 remote pageouts scheduled
0 pending disk I/Os blocked with no pbuf
0 paging space I/Os blocked with no psbuf
2484 filesystem I/Os blocked with no fsbuf
54484 client filesystem I/Os blocked with no fsbuf
0 external pager filesystem I/Os blocked with no fsbuf
CDR /home #
---------- Post updated at 06:37 AM ---------- Previous update was at 06:34 AM ----------
and we are running out of storage space and in September we are going to make some upgrade... We think like you that splitting tablespaces/containers will give us some performance....
54484 client filesystem I/Os blocked with no fsbuf
This is not pretty and could be tried to be tuned.
You can try:
ioo -p -o j2_dynamicBufferPreallocation=32
Default is 16. This is dynamic so remount is needed. If the counter still increases rather quick, you can do additionally:
ioo -p -o j2_nBufferPerPagerDevice=1024
Default is 512; you can also start with 2048. For this one you will have to remount the filesystems.
It could be that you do not notice this in terms of performance when using the application on top.
I will do the configs tomorrow... you are very clever thanks a lot
---------- Post updated at 08:39 AM ---------- Previous update was at 08:02 AM ----------
well I changed these parameters and untill we started DB2 I run the command:
vmstat -v
6291456 memory pages
6091921 lruable pages
5077989 free pages
3 memory pools
370646 pinned pages
80.0 maxpin percentage
3.0 minperm percentage
90.0 maxperm percentage
1.2 numperm percentage
76542 file pages
0.0 compressed percentage
0 compressed pages
1.2 numclient percentage
90.0 maxclient percentage
76542 client pages
0 remote pageouts scheduled
0 pending disk I/Os blocked with no pbuf
0 paging space I/Os blocked with no psbuf
2484 filesystem I/Os blocked with no fsbuf
650 client filesystem I/Os blocked with no fsbuf
0 external pager filesystem I/Os blocked with no fsbuf
as we see the 2484 filesystem I/Os blocked with no fsbuf
is still present... I will try to read about fsbuf and try find out why I have these i/o blocks
I am not sure if there is not just an intial impact that might occure just because of intense traffic when starting up everything after a boot. I would tend to monitor those counters if they increase in normal day work, especially after peak times.
Anyway it is good to read tuning guides, IBM Redbooks about performance, Jaqui Lynch's articles about tuning etc. Also the man page on ioo is very interesting.
Excuse me, but I still learn
How do you know that Vit0_Corleone use jfs2 and problem with it not other filesystem, maybe jfs?
Maybe, j2_dynamicBufferPreallocation can use for jfs too?
As he said he is using AIX 6.1 I strongly doubt that he is using jfs. jfs is somewhat out of date and very limited. If he is using jfs, I strongly recommend changing that.
john1212, zaxxon
Of course I am using JFS2.. And we use 100GB files for DB2 and these files are allocated in one partition that's why we have so big load.. We are waiting storage upgrade and than we will allocate DB2 containers to separate file systems.. I will post then result
... for posting the result then
I think Vit0_Corleone writen "unill we started DB2" means "before we started DB2".
Hence my conclusion that the problem is not with DB2, and not with filesystems include db2.
I haven't yet written but this would be my request.
I'm sorry Vit0_Corleone, if I wrong read your message.
Very seriously.
I think
mvstat-v
at line:
filesystem I/Os blocked with no fsbuf
display a summary of the statistics Since boot.
If you interesing filesystems on SAN you ought to show statistics:
client filesystem I/Os blocked with no fsbuf
zaxxon was of course correct to point this counter out, but i would like to explain it a bit more, for completeness:
This counter (and similar ones, like "paging space I/Os blocked with no psbuf", etc.) are collected from boot time. Therefore the value itself is not that interesting but its change over time is.
A high value which doesn't (or does hardly) change over time hints to a problem in the past which is not occurring right now. That doesn't necessarily mean that the problem is solved, just its symptom is not occurring. Always take the number one advice of performance monitoring/tuning to heart: a problem is not solved when its symptom cannot be observed, but only when the symptom doesn't occur any more and you understand why this is so!
If you have to trace this or a similar counter always take snapshots over some extended period (every hour, every day, ...) and compare. Notice how much the counter increases - that is: the order of magnitude of the increment. If the counter increases in the tens over a day everything is ok, if it increases in the hundreds over an hour you have to investigate, if the increment is even higher you have to act immediately.
I hope this helps.
bakunin
yesterday and today I have often run the command: vmstat -v and the values are not changing to tell the truth.. here is the output
vmstat -v
6291456 memory pages
6091921 lruable pages
1024402 free pages
3 memory pools
407822 pinned pages
80.0 maxpin percentage
3.0 minperm percentage
90.0 maxperm percentage
2.9 numperm percentage
182505 file pages
0.0 compressed percentage
0 compressed pages
2.9 numclient percentage
90.0 maxclient percentage
182505 client pages
0 remote pageouts scheduled
0 pending disk I/Os blocked with no pbuf
0 paging space I/Os blocked with no psbuf
2484 filesystem I/Os blocked with no fsbuf
650 client filesystem I/Os blocked with no fsbuf
0 external pager filesystem I/Os blocked with no fsbuf
CDR / #
Looks like a good sign. As bakunin said - just have a look on it maybe twice every week or have a script informing you every day in the morning as a report. Anyway it might be a good idea to split your storage for smoother performance.
zaxxon yes and thanks again to you and to everybody who tried to help me as soon as we make storage upgrade I will allocate DB2 containers to separate File Systems.. Let's than see what will happen
My comments:
vmstat �v | grep 'filesystem I/Os blocked with no fsbuf'
it�s statistics about filesystems jfs
vmstat �v |grep 'client filesystem I/Os blocked with no fsbuf'
It�s statistics about filesystems nfs, veritas, �
vmstat �v|grep 'external pager filesystem I/Os blocked with no fsbuf'
It�s statisctics about filesystems jfs2
Conclusions are obvious.
Vit0_Corleone
you have disks on SAN. by NFS or FiberConnect(etherConnect)
It's maybe problem with tcp.
When will be the problem do
netstat -s
You can test the connect make big transfer from server to dysk, better to same disks.
Well we have finished upgrading of DS8000 and I located DB2 containers on separate filesystems.. the performance is just a perfect!!! thanks to everybody