Performance issue / tuning advice

Please take a look at this system and give your analysis / advice. Can it be tuned to get a better performance?
We are not getting more hardware ressources at the moment.
We have to live with what we have. Application running on the system is SAS. OS is AIX 6.1
Let me know if you need output of more commands...

# vmstat -w 1 20

System configuration: lcpu=6 mem=20480MB ent=3.00

 kthr          memory                         page                       faults                 cpu
------- --------------------- ------------------------------------ ------------------ -----------------------
  r   b        avm        fre    re    pi    po    fr     sr    cy    in     sy    cs us sy id wa    pc    ec
 22  19    3579818      86874     0     0     0 67844 123310     0 10913  32142 85599 49 49  0  2  2.94  97.9
 26  18    3580751      87123     0     0     0 67666  93373     0  8490  26159 132536 48 45  0  7  2.80  93.5
 28  15    3582827      87089     0     0     0 76540 144681     0 11421  34167 101260 48 50  0  2  2.96  98.5
 26  19    3584824      87106     0     0     0 75337 127281     0 10684  33874 119796 49 50  0  1  2.98  99.2
 50   8    3585697      87109     0     0     0 92728 140676     0  9537  33579 85175 50 49  0  2  2.95  98.4
 35   7    3586172      88186     0     0     0 77249 136529     0 12591  31836 89203 46 53  0  1  2.97  99.0
 24  20    3588964      87137     0     0     0 73329 136712     0 11062  28124 76413 50 48  0  2  2.98  99.2
 42  13    3589926      87094     0     0     0 78307 123497     0  9240  29060 106892 52 48  0  1  2.99  99.7
 35  13    3585537      87208     0     0     0 69645 141586     0 10055  24548 87894 42 52  0  6  2.83  94.5
 35   9    3596513      87645     0     0     0 28480  44099     0  6482  41033 45360 51 49  0  0  3.00  99.9
 56   5    3567954      87653     0     0     0 56711 124429     0  7532  40724 13891 48 52  0  0  3.00 100.0
 51   9    3566138      86282     0     0     0 83624 126816     0  7582  48011 14971 46 54  0  0  3.00 100.1
 29  12    3527596      87942     0     0     0 59565  86520     0 10428  35858 22459 45 55  0  0  3.00  99.9
 43   9    3519915      89237     0     0     0 74838 130716     0  7460  30086 23766 51 49  0  0  3.00 100.2
 47   4    3519288      86889     0     0     0 65930  78800     1  8244  36199 18878 51 49  0  0  3.00 100.1
 38   5    3498940      87741     0     0     0 59209 136282     0 10774  34718 66311 47 50  0  2  2.93  97.7
 55   5    3509379      87351     0     0     0 89311 141676     0  8963  37806 42990 52 48  0  0  3.00 100.1
 38   8    3518601      87748     0     0     0 89609 119495     0  9635  38512 106600 52 47  0  1  2.98  99.4
 39   6    3525575      88857     0     0     0 84893 151599     0  8268  29233 118665 52 47  0  1  2.98  99.2
 36   8    3533680      87521     0     0     0 89532 145853     0  9447  35574 79352 51 48  0  0  2.99  99.8

#vmstat -v
              5242880 memory pages
              5060608 lruable pages
                89262 free pages
                    1 memory pools
              1534573 pinned pages
                 80.0 maxpin percentage
                  3.0 minperm percentage
                 90.0 maxperm percentage
                 28.6 numperm percentage
              1448701 file pages
                  0.0 compressed percentage
                    0 compressed pages
                 28.6 numclient percentage
                 90.0 maxclient percentage
              1448701 client pages
                    0 remote pageouts scheduled
                  260 pending disk I/Os blocked with no pbuf
               731746 paging space I/Os blocked with no psbuf
                 2484 filesystem I/Os blocked with no fsbuf
                    0 client filesystem I/Os blocked with no fsbuf
            596244370 external pager filesystem I/Os blocked with no fsbuf

If you want help, you should provide complete system details, including hardware, software (OS and applications), etc.

HW: P6 595
OS: 6100-01-05-0920
Appl: SAS

What else do you need?

I do not know SAS - I could see it is some Business Intelligence Software, which I guess can consist of different kinds of applications. More information would be interessting, if there is any type of database is involved, webservers etc.

Has anybody tuned this system already? There is a guide I just found which might be interessting for you:

What does really not look good is:

  1. kthr/r Run Queue -- Did you try exporting AIXTHREAD_SCOPE=S? It looks to me, as if there are just far too few CPU resources to handle all ready-to-run threads. It would be really interessting to see, how the system behaves if you could add 3 more physical CPU units. Also it would be interessting, if your application benefits from partitioning your 3 physical CPUs to 6 virtual CPUs which would result with SMT turned on (which is the case in your system) to 12 logical CPUs instead of just 6.
  2. For scanning memory pages, the ratio is ok, example fr: 83624 sr: 126816 which is about 1:1,5. Though it is a lot of free pages the system demands and so with has to scan. Beside that it nearly always has about 80k pages on the free list. Could you please post the output of vmo -x minfree -x maxfree, thanks. Best add the output of vmo -x lru_file_repage too, just in case. Also the output of svmon -G might be interessting.
  3. What is definetly tunable, is the bad high counter for
    596244370 external pager filesystem I/Os blocked with no fsbuf. Setting the double of your current (check with ioo -x j2_dynamicBufferPreallocation) or if it is still 16 which is default, try 128 instead and check over time, if counter is still increasing by large number. You can also tune the ioo -x j2_nBufferPerPagerDevice to let's say 2048 and check if the counter increases rapidly still. It will take some time to monitor it. Check the man page for ioo, which of the 2 parameters can be changed dynamic and which needs a remounts of the filesystems.
  4. Currently your box is not paging in or out to Paging Space, but you have this: 731746 paging space I/Os blocked with no psbuf
    which shows, there has been such activity. I bet a lsps -a will show, that some of your Paging Space is used. If the output of ioo -x lru_file_repage turns out to be 0 already, then I would add some more memory to this box.

So far my 1st ideas for your box. Would be nice to hear any feedback if anything of it helped.

1 Like

What databases are you using?

Thanks for the reply I don't know much abou the application, but there is NO db running on the system.
As a first step I will try to change the number of virtual CPU's and AIXTHREAD_SCOPE, but this will take time, as every change requires a long process...

Her is the information you requested:

#vmo -x minfree -x maxfree
maxfree,1088,1088,1088,16,4194304,4KB pages,D,minfree memory_frames
minfree,960,960,960,8,4194304,4KB pages,D,maxfree memory_frames

#vmo -x lru_file_repage
lru_file_repage,0,0,0,0,1,boolean,D,

 #svmon -G
               size       inuse        free         pin     virtual
memory      5242880     5223281       89231     1534929     3406525
pg space    2883584      101634

               work        pers        clnt       other
pin         1294398           0        2691      237840
in use      3345877           0     1877404

PageSize   PoolSize       inuse        pgsp         pin     virtual
s    4 KB         -     3831041      101634      424625     2014285
m   64 KB         -       87015           0       69394       87015

#ioo -x j2_dynamicBufferPreallocation
j2_dynamicBufferPreallocation,16,16,16,0,256,16K slabs,D,

# ioo -x j2_nBufferPerPagerDevice
j2_nBufferPerPagerDevice,512,512,512,0,262144,,M,

#lsps -a
Page Space      Physical Volume   Volume Group Size %Used Active  Auto  Type Chksum
hd6             hdisk598          rootvg       11264MB     4   yes   yes    lv     0




Beside the changes to CPUs, I would just change the

j2_dynamicBufferPreallocation=128

and if it doesn't help, additionally

j2_nBufferPerPagerDevice=2048

Monitor the increase of

    596244370 external pager filesystem I/Os blocked with no fsbuf

per day for some days and check if the difference per day decreases after making the recommended changes.
Let us know what vmstat looks like when you have 6 virtual CPUs. Setting up a long time monitoring might reveal when the paging in/out to Paging Space occurs.

---------- Post updated at 03:15 PM ---------- Previous update was at 02:11 PM ----------

I did not mention the pc column of vmstat so far; it constantly shows a pc (physical cpu consumption) of 3.0 which is all you have. Even changing it to more virtual CPUs might not really unburden your 3 physical CPUs. I would strongly go for more CPU units for that LPAR or check out if there is any optimizations possible in the way the application works (code, config, ...).

1 Like

as already mentioned - you have too many threads for your cpus - and you have pretty high kernel utilization what I would not expect to see ...
you can try to mount your filesystems with -noatime option which usually brings some improvement - and depending on your underlying DB you should try to reduce the filebuffering - give cio or - if you have oracle - SETALL - a go. You have constant scan to free even though you seemingly have a rather high free list - but I cannot see any IO justifying this behaviour. What kind of storage do you use? And does your application / DB give you the opportunity to use async IO ?

Regards
zxmaus