DB2 high CPU

Vit0_Corleone · August 16, 2010, 5:36am

Hello dear friends, I have Lpar AIX 6.1 and there is DB2 installed. I have 8 Virtual CPU's configured on my Lpar and when I run nmon the CPU waiting time is always big.. I will provide screenshot for better realizing.. my question is what may produce so big waiting time?? Thanks in advance!

Vit0_Corleone · August 16, 2010, 6:14am

Maybe The wait process means that CPU is IDLE and waiting for other jobs?

zaxxon · August 16, 2010, 6:21am

Have you checked what processes are running currently? There can be several different reasons for that. Do you have a disk bottleneck (iostat) or network? Does DB2 have some internal trouble like locks that can't be resolved etc.?

Maybe this is interessting for you too:

Demystifying I/O Wait:
http://unix.derkeiler.com/Mailing-Lists/AIX-L/2004-04/att-0096/Demystifying_IOWAIT.pdf

---------- Post updated at 12:21 PM ---------- Previous update was at 12:16 PM ----------

Just saw your assumption - you can easily check if this is just the wait from the kernel wait processes if you run vmstat -w 1 10 for example. If the I/O Wait column is zero, then it is the wait processes.
I am not that familiar with that nmon view of the CPUs but I think that wait processes are not displayed in there, just real I/O wait like in vmstat for example.

Vit0_Corleone · August 16, 2010, 6:33am

vmstat -w 1 10 

System configuration: lcpu=16 mem=24576MB ent=4.00

 kthr   memory                 page                       faults                 cpu          
-------  ------------------------------------ ------------------ -----------------------
  r   b        avm        fre    re    pi    po    fr     sr    cy    in     sy    cs us sy id wa    pc    ec
  2   0    5235358     868064     0     0     0     0      0     0  2999  10933  6223  5  3 88  4  0.40  10.1
  0   0    5235359     868063     0     0     0     0      0     0  2629   9390  5547  4  3 89  4  0.34   8.6
  0   0    5235357     868065     0     0     0     0      0     0  2547  13373  5302  5  4 88  4  0.40  10.0
  0   0    5235357     868065     0     0     0     0      0     0  3038  11937  6316  6  3 88  4  0.43  10.7
  0   0    5235357     868064     0     0     0     0      0     0  2421   9710  5455  4  3 89  4  0.35   8.8
  0   0    5235358     868063     0     0     0     0      0     0  2604   9692  5381  4  3 89  4  0.33   8.3
  0   0    5235358     868063     0     0     0     0      0     0  3722  11819  7843  5  4 87  4  0.45  11.3
  1   0    5235356     868065     0     0     0     0      0     0  3055  12577  6214  5  3 88  3  0.43  10.6
  0   0    5235357     868064     0     0     0     0      0     0  3354  20234  6494  6  4 86  3  0.51  12.8
  0   0    5235357     868064     0     0     0     0      0     0  2444   8784  5198  4  3 89  4  0.32   8.1

and here is iostat and I guess the only problem is that disks are heavily loaded

Disks:        % tm_act     Kbps      tps    Kb_read   Kb_wrtn
hdisk3          99.8     17080.0     897.1      75544     95384
hdisk2          94.6     16653.1     905.1      76600     90056
hdisk4          20.2     880.5     152.6         16      8796
hdisk0           1.4      28.4       2.8          4       280
hdisk1           1.3      28.0       2.7          0       280

zaxxon · August 16, 2010, 6:46am

Yes there is some traffic on hdisk2 and hdisk3 having a high busy time and lots of transactions while using some bandwith.

Else CPU and memory wise the box looks good so that "wwwww"s in nmon will be the kernel wait processes, nothing to worry.

If you experience any slow behaviour though, you might want to think if it there is any improvement of the disk layout possible, ie. splitting tablespaces/containers to more than 2 volumes. Maybe also db2expln shows that there can be more efficiency by implementing an index at the right place etc.
I guess you have some SAN storage behind this with a cache - if this is the case, you might want to check if those SAN disks are ok with that load or not.

A vmstat -v might be interessting too in that case.

Vit0_Corleone · August 16, 2010, 7:37am

Yes these heavily loaded disks are located on the DS8000 storadge.. but I have no good understanding how to work with this storadge and therefore I cant check anything on that storadge
and here is vmstat -v result and i want to thank you for your help...

vmstat -v
              6291456 memory pages
              6091921 lruable pages
               733113 free pages
                    3 memory pools
               483310 pinned pages
                 80.0 maxpin percentage
                  3.0 minperm percentage
                 90.0 maxperm percentage
                  4.5 numperm percentage
               278340 file pages
                  0.0 compressed percentage
                    0 compressed pages
                  4.5 numclient percentage
                 90.0 maxclient percentage
               278340 client pages
                    0 remote pageouts scheduled
                    0 pending disk I/Os blocked with no pbuf
                    0 paging space I/Os blocked with no psbuf
                 2484 filesystem I/Os blocked with no fsbuf
                54484 client filesystem I/Os blocked with no fsbuf
                    0 external pager filesystem I/Os blocked with no fsbuf
CDR /home #

---------- Post updated at 06:37 AM ---------- Previous update was at 06:34 AM ----------

and we are running out of storage space and in September we are going to make some upgrade... We think like you that splitting tablespaces/containers will give us some performance....

zaxxon · August 16, 2010, 8:41am

54484 client filesystem I/Os blocked with no fsbuf

This is not pretty and could be tried to be tuned.

You can try:

ioo -p -o j2_dynamicBufferPreallocation=32

Default is 16. This is dynamic so remount is needed. If the counter still increases rather quick, you can do additionally:

ioo -p -o j2_nBufferPerPagerDevice=1024

Default is 512; you can also start with 2048. For this one you will have to remount the filesystems.

It could be that you do not notice this in terms of performance when using the application on top.

Vit0_Corleone · August 16, 2010, 9:39am

I will do the configs tomorrow... you are very clever thanks a lot

---------- Post updated at 08:39 AM ---------- Previous update was at 08:02 AM ----------

well I changed these parameters and untill we started DB2 I run the command:

vmstat -v
              6291456 memory pages
              6091921 lruable pages
              5077989 free pages
                    3 memory pools
               370646 pinned pages
                 80.0 maxpin percentage
                  3.0 minperm percentage
                 90.0 maxperm percentage
                  1.2 numperm percentage
                76542 file pages
                  0.0 compressed percentage
                    0 compressed pages
                  1.2 numclient percentage
                 90.0 maxclient percentage
                76542 client pages
                    0 remote pageouts scheduled
                    0 pending disk I/Os blocked with no pbuf
                    0 paging space I/Os blocked with no psbuf
                 2484 filesystem I/Os blocked with no fsbuf 
                  650 client filesystem I/Os blocked with no fsbuf
                    0 external pager filesystem I/Os blocked with no fsbuf

as we see the 2484 filesystem I/Os blocked with no fsbuf
is still present... I will try to read about fsbuf and try find out why I have these i/o blocks

zaxxon · August 16, 2010, 10:16am

I am not sure if there is not just an intial impact that might occure just because of intense traffic when starting up everything after a boot. I would tend to monitor those counters if they increase in normal day work, especially after peak times.

Anyway it is good to read tuning guides, IBM Redbooks about performance, Jaqui Lynch's articles about tuning etc. Also the man page on ioo is very interesting.

john1212 · August 18, 2010, 2:57am

Excuse me, but I still learn
How do you know that Vit0_Corleone use jfs2 and problem with it not other filesystem, maybe jfs?
Maybe, j2_dynamicBufferPreallocation can use for jfs too?

zaxxon · August 18, 2010, 4:37am

As he said he is using AIX 6.1 I strongly doubt that he is using jfs. jfs is somewhat out of date and very limited. If he is using jfs, I strongly recommend changing that.

Vit0_Corleone · August 18, 2010, 5:02am

john1212, zaxxon
Of course I am using JFS2.. And we use 100GB files for DB2 and these files are allocated in one partition that's why we have so big load.. We are waiting storage upgrade and than we will allocate DB2 containers to separate file systems.. I will post then result

zaxxon · August 18, 2010, 5:10am

... for posting the result then

john1212 · August 18, 2010, 5:45am

vit0_corleone:

well I changed these parameters and untill we started DB2 I run the command:
vmstat -v
as we see the 2484 filesystem I/Os blocked with no fsbuf
is still present... I will try to read about fsbuf and try find out why I have these i/o blocks

I think Vit0_Corleone writen "unill we started DB2" means "before we started DB2".
Hence my conclusion that the problem is not with DB2, and not with filesystems include db2.
I haven't yet written but this would be my request.
I'm sorry Vit0_Corleone, if I wrong read your message.

Very seriously.
I think

mvstat-v

at line:
filesystem I/Os blocked with no fsbuf
display a summary of the statistics Since boot.

If you interesing filesystems on SAN you ought to show statistics:
client filesystem I/Os blocked with no fsbuf

bakunin · August 18, 2010, 7:02am

zaxxon was of course correct to point this counter out, but i would like to explain it a bit more, for completeness:

This counter (and similar ones, like "paging space I/Os blocked with no psbuf", etc.) are collected from boot time. Therefore the value itself is not that interesting but its change over time is.

A high value which doesn't (or does hardly) change over time hints to a problem in the past which is not occurring right now. That doesn't necessarily mean that the problem is solved, just its symptom is not occurring. Always take the number one advice of performance monitoring/tuning to heart: a problem is not solved when its symptom cannot be observed, but only when the symptom doesn't occur any more and you understand why this is so!

If you have to trace this or a similar counter always take snapshots over some extended period (every hour, every day, ...) and compare. Notice how much the counter increases - that is: the order of magnitude of the increment. If the counter increases in the tens over a day everything is ok, if it increases in the hundreds over an hour you have to investigate, if the increment is even higher you have to act immediately.

I hope this helps.

bakunin

Vit0_Corleone · August 18, 2010, 7:44am

yesterday and today I have often run the command: vmstat -v and the values are not changing to tell the truth.. here is the output

vmstat -v
              6291456 memory pages
              6091921 lruable pages
              1024402 free pages
                    3 memory pools
               407822 pinned pages
                 80.0 maxpin percentage
                  3.0 minperm percentage
                 90.0 maxperm percentage
                  2.9 numperm percentage
               182505 file pages
                  0.0 compressed percentage
                    0 compressed pages
                  2.9 numclient percentage
                 90.0 maxclient percentage
               182505 client pages
                    0 remote pageouts scheduled
                    0 pending disk I/Os blocked with no pbuf
                    0 paging space I/Os blocked with no psbuf
                 2484 filesystem I/Os blocked with no fsbuf
                  650 client filesystem I/Os blocked with no fsbuf
                    0 external pager filesystem I/Os blocked with no fsbuf
CDR / #

zaxxon · August 18, 2010, 7:55am

Looks like a good sign. As bakunin said - just have a look on it maybe twice every week or have a script informing you every day in the morning as a report. Anyway it might be a good idea to split your storage for smoother performance.

Vit0_Corleone · August 18, 2010, 8:15am

zaxxon yes and thanks again to you and to everybody who tried to help me as soon as we make storage upgrade I will allocate DB2 containers to separate File Systems.. Let's than see what will happen

john1212 · August 19, 2010, 3:49am

My comments:

vmstat �v | grep 'filesystem I/Os blocked with no fsbuf'

it�s statistics about filesystems jfs

vmstat �v |grep 'client filesystem I/Os blocked with no fsbuf'

It�s statistics about filesystems nfs, veritas, �

vmstat �v|grep 'external pager filesystem I/Os blocked with no fsbuf'

It�s statisctics about filesystems jfs2
Conclusions are obvious.

Vit0_Corleone
you have disks on SAN. by NFS or FiberConnect(etherConnect)
It's maybe problem with tcp.
When will be the problem do

netstat -s

You can test the connect make big transfer from server to dysk, better to same disks.

Vit0_Corleone · October 11, 2010, 3:58pm

Well we have finished upgrading of DS8000 and I located DB2 containers on separate filesystems.. the performance is just a perfect!!! thanks to everybody