AIX 5.3, TL6 problem??

Hi people,

I have a IBM server (with HACMP) running AIX 5.3 with Oracle9.2.
In last weekend i installed TL6...

root@srv_node1:/> oslevel -s
5300-06-03-0732

but now the Shared pool of oracle is very low... making all process very slow

I already try to find some answer in google... but nothing that explain the problem!

Someone had same problem?
Or can give me a direction to find my problem?

Thanks a lot

I am no Oracle admin and it might sound dumb but can't you just set that parameter for the shared pool back to what it was?
We have same ML running on our AIX 5.3 boxes with lots of Oracle DBs on them, mainly Oracle 10.2 though and I didn't get notice of any complains.

A ML update doesn't change kernel parameters and I doubt it is the reason, but maybe lets check your VMM settings?

vmo -x| grep -iE "file_rep|minperm|maxperm|maxclient"

And add a

vmstat 1 20

from the time when it is running very slow, ty.

Thanks!
Also, Im not a oracle admin... neither AIX... Before i was only a network administrator, but now the previous sys admin dont work anymore in this company so i have to do his job! well... good to increase my knowledge! :slight_smile:

I will do that... and answer here.

Thanks

Hi again!

root@srv_node1:/export> vmo -x| grep -iE "file_rep|minperm|maxperm|maxclient"
lru_file_repage,1,1,1,0,1,boolean,D,
maxclient%,20,80,20,1,100,% memory,D,maxperm% minperm%
maxperm,2001277,,2001277,,,,S,
maxperm%,20,80,20,1,100,% memory,D,minperm% maxclient%
minperm,500318,,500318,,,,S,
minperm%,5,20,5,1,100,% memory,D,maxperm% maxclient%
strict_maxclient,1,1,1,0,1,boolean,D,strict_maxperm
strict_maxperm,0,0,0,0,1,boolean,D,strict_maxclient

and

root@srv_node1:/export> vmstat 1 20
 
System configuration: lcpu=12 mem=40959MB ent=4.00
 
kthr    memory              page              faults              cpu
----- ----------- ------------------------ ------------ -----------------------
 r  b   avm   fre  re  pi  po  fr   sr  cy  in   sy  cs us sy id wa    pc    ec
 1  0 4542286 3937366   0   0   0   0    0   0 1452 15842 1351 31  3 65  1  1.56  39.0
 1  0 4542974 3936667   0   0   0   0    0   0 1467 7129 1134 27  1 71  0  1.33  33.2
 9  0 4542304 3937337   0   0   0   0    0   0 1522 15256 1239 44  2 54  0  2.01  50.1
 5  0 4542336 3937303   0   0   0   0    0   0 1478 18160 1361 51  3 46  0  2.32  58.1
 5  0 4537671 3941967   0   0   0   0    0   0 1501 21139 1427 35  5 60  0  1.84  46.0
 2  0 4538563 3941074   0   0   0   0    0   0 1448 13117 1173 33  3 64  0  1.62  40.6
 2  0 4547324 3932314   0   0   0   0    0   0 1373 15868 1767 41  3 56  0  1.98  49.4
 2  0 4541708 3937929   0   0   0   0    0   0 1535 7791 1186 32  2 66  0  1.54  38.5
 6  0 4541799 3937838   0   0   0   0    0   0 1506 14251 1443 39  3 58  0  1.86  46.6
 2  0 4541799 3937838   0   0   0   0    0   0 1523 20458 1251 39  3 58  0  1.89  47.1
 4  0 4541799 3937838   0   0   0   0    0   0 1542 16185 1545 41  3 54  2  1.94  48.6
 2  0 4547379 3932257   0   0   0   0    0   0 1507 13570 1274 45  2 53  0  2.07  51.7
 3  0 4541744 3937892   0   0   0   0    0   0 1529 8264 1241 44  2 52  2  2.06  51.6
 5  0 4541782 3937853   0   0   0   0    0   0 1535 16189 1148 33  4 62  0  1.72  43.1
 5  0 4543755 3935880   0   0   0   0    0   0 1527 17059 1285 35  5 59  0  1.89  47.2
 4  0 4542814 3936820   0   0   0   0    0   0 1474 11940 1107 31  3 65  0  1.60  40.1
 3  0 4543812 3935822   0   0   0   0    0   0 1587 16339 1514 44  3 51  1  2.13  53.2
 3  0 4545096 3934536   0   0   0   0    0   0 1432 6213 1047 26  2 71  0  1.29  32.2
 4  0 4537334 3942298   0   0   0   0    0   0 1484 20928 1213 37  4 59  1  1.83  45.8
 6  0 4536917 3942714   0   0   0   0    0   0 1572 17462 1590 49  3 47  2  2.27  56.8

Well, as i said im very new on this technology... but looks like its all ok with server.

Could problem come from the storage or hacmp?

Thanks.

Looking at the vmo parameters, it has been tuned the "old" way, setting maxperm and maxclient low instead of lru_file_repage=0, which is absolutely fine though.

Your vmstat looks very good - running kernel threads queue is not higher than you have CPUs in the system, no swapping into/from paging space, and no I/O waits - It even has time to idle.

From the AIX side, the system looks fine to me and not very burdened.
So I think you need some Oracle admin to have a look at your Oracle parameters/performance.

Though the last thing you can check for AIX is, with "iostat 2" for example, if there are disks, that are very busy (% tm_act column) or have a lot of transactions to do (tps column) or a lot of throughput in terms of Kbs. But I guess there is not much of a problem.
Also a

ps aux| grep -c aioserver
# and a
lsattr -El aio0

will be interessting.

But I guess that will not be the reason you have problems since updating to the new ML... You might have a look at http://metalink.oracle.com and post your problem there or contact your Oracle support or some Oracle forum additionally.

Btw. how do you know the Shared Pool is very low? Is the parameter set for SGA etc. changed or can you measure/monitor the usage of the Shared Pool? With 40GB and maybe not much running else on this box consuming memory, you have a good amount of RAM to use/distribute.

If the performance is unexplainably bad it might be a monitoring software problem: i remember from the last project i worked at that there was a problem with the CA-Unicenter Oracle agent. There were combinations of Oracle patch levels and revisions of this agent which worked and most others caused severe performance problems as the agent hogged as many system resources as it could get over a few days. (Taking up several GB of memory for basically nothing was not unusual.)

Still this would be showing in vmstat and this is not the case. In fact yor machine looks pretty good there: the "fre" column is quite high (this is the memory which is outright available, in 4K-pages), the blocked-queue (b) is constantly zero and the "ec" column shows low values (this is the "entitled capacity consumed", the percentage of available processing resources which are really used - if it goes near 100 this indicates that the processor(s) are too small/few for the load).

Here is some literature i found when i was working on a performance optimization project involving Oracle on AIX 5.3:

http://publib16.boulder.ibm.com/pseries/en_US/aixbman/prftungd/prftungd.pdf

http://www.redbooks.ibm.com/redbooks/pdfs/sg244810.pdf

System Performance Tuning, O'Reilly, 1990, ISBN 0-937175-60-9

Tuning for Oracle9i on AIX

AIX 5L Initial Tuning

I hope this helps.

bakunin

Hi people... couldnt come here before...

but thanks a lot for your answers!

Zaxxon, the Shared Pool is now with 1,5GB... but now i dont have sure that the problem come from the Oracle!! Because now we notice the system is also
slow in compile (Cobol and C/C++)...

bakunin, thanks for your experience and your advice on reading! i will take that :slight_smile:

Someone know a good program or something to benchmark AIX?

My main problem is that i dont have sure of the source of problem... the only thing that i know is all problems started when i put TL6 and efix...

hmmm

an "efix" is an "emergency fix" in AIX. That is: someome reports a bug in the OS to IBM. If they recognize that its really their fault (not misconfiguration, misuse, etc.) they create an "APAR". I can't remember what the acronym means, the "P" stands for "problem", but basically it says "we officially declare having a bug".

Only then someone creates a fix for this, which results in a downloadable package (the "IY...."-files). Many of these packages bound together and repackaged make a "maintenance level" or "technical level", which is basically the same.

Where do the efixes come in? When a package is created it has to be thoroughly tested with all the other software parts of the OS because of possible interdependencies. Therefore it takes some time before such a IY-file is finished. If one doesnt have that much time IBM creates an "efix": a quick patch, which will hopefully correct the bug but nothing more. It might even create more problems as it solves. Plus: the package it corrects cannot be updated in a normal way any more. The efix has to be removed before updating it. (In the lslpp otput it has status "EFIX-LOCKED".) Additionally efixes are mutually exclusive: it might well be that several efixes for the same package exist but yo can install only one or the other, not two or more.

Bottom line: If you need an efix on top of TL6, why haven't you installed TL7 or even TL8 instead, which probably would correct the problem - without having to resort to efixes? Efixes are something one installs as an absolute last effort. Its not just another kind of update.

I hope this helps.

bakunin

Well, thanks a lot bakunin. I didnt knew all that information...

I removed efix... now im running some tests... compiling and oracle...let's see!!

but, do you (or someone else) know some benchmark tool that i can use?

thanks

Use NMON, its simply the best overall. If you have very, very specific reqirements this might not always be the case and there are other tools (especially system tools like vmstat, iostat, svmon, netmon, ...) which will do the same in their area, but if yo are looking for a single tool which tells you everything usually relevant go for NMON.

bakunin

Hi again!

Well... nmon is great... but is only a tool to monitor... i would like a tool to do benchmarks...

What else than monitoring how the various aspects of the system load (memorywise, CPU-wise, etc.) change under a well defined job load? You put some task onto a machine and monitor how long it takes, how much memory it needs, how many CPU cycles it takes, ...., yes?

So, in a sense, the best benchmarking tool is "time", which measures how long a command takes, for most of the other aspects mentioned before NMON is a good tool to record them (use the "daemon mode" for that, the option is "-F").

But maybe i have misunderstood you completely and you are looking for benchmark tests, something like SPECfp, SPECint, LINPACK, etc. You might want to consult the website of the TPC (Transaction Processing Council) in this case (Transaction Processing Performance Council). Most if not all of the relevant benchmarks are done already for existing machines and probably you can find the result of what you intend to do there.

The benchmark results there are very database-oriented, maybe you are more interested in synthetical benchmarks, which better reflect the various aspects of "performance" better - bandwidth of memory interface, integer operations, second-/third-level cache hit/miss-ratios, etc. Then you might be better off at the Standard Performance Evaluations Corporations website (SPEC - Standard Performance Evaluation Corporation).

I hope this helps.

bakunin

Benchmarks? What types of benchmarks? Check whether nstress (IBM developerWorks: Wikis - AIX - nstress) or rPerf (IBM developerWorks: Wikis - AIX - rperf) can assist you.
For doing benchmarks on the I/O subsystem I would recommend XDD (which has the little drawback that you need to compile it yourself but the advantage that you can build versions for other operating systems as well making results comparable).
I/O Performance, Inc.

Sorry for delay in answering... was travelling!

thanks a lot for your advices... i will try yours sugestions

try NMON can be downloaded free