ORACLE Database running slow on AIX ( nmon / topas )

filosophizer · August 7, 2010, 8:32am

Hello,

How can I know if ORACLE Database is running slow due to Memory or due to processing power ?

I have only Oracle Database running on a P4 with 4GB RAM.

Could anyone suggest any tools which can help me determine exactly if it is memory issue or processor issue.

zaxxon · August 7, 2010, 9:47am

vmstat -w 1 20

is a good start.

---------- Post updated at 03:47 PM ---------- Previous update was at 03:34 PM ----------

And I forgot to say: There are some threads in our AIX subforum about performance monitoring, tuning etc. Maybe you check those 1st and see if these help so one has not to write all the same again and again

filosophizer · August 11, 2010, 5:26am

This is an old machine running AIX 5.2 on P4

running an ORACLE Query and then running the command

sar 1 20

gives the idle time between 1 - 20

otherwise the idle time is 99

The memory is 4GB

How to know if it is processor issue or upgrading memory from 4GB to 16GB will speed up the ORACLE Queries.

As far as I know all the processing takes place in the memory.

zaxxon · August 11, 2010, 9:31am

Not sure what the query is. It can be a complicated or a trivial one. There are several kind of memory types Oracle is using, iiirc, and the biggest is or was the SGA (I am not up to date). Not sure if all those several types that might be configured for that instance can be satisfied with your 4 GB, not to forget that the OS needs some RAM too. Also it is a question on what disks your DB is placed. Slow internal SCSI disks or maybe fast and cached SAN disks.

I remember from DB2 a tool called db2expln with that you could easily check out if a query uses indexes, etc. or does slow table scans etc. I bet there is something similar for Oracle, ie. some kind of SQL analyzer. Maybe it is a missing Index etc...

If you do not mind please let the vmstat -w run while the query is running and post it here using code tags.

What might be also interessting could be following output:

lsdev -C -l aio0

If there is several problems, it always often a solution to turn on Concurrent I/O (mount option "cio" and in Oracle filesystemio_options=setall) for the data, redo and log filesystems only.
But maybe get the info up there 1st - maybe you find something suspicous.

zxmaus · August 12, 2010, 12:46am

Hi,

to correct zaxxon (sorry !) turn on EITHER cio (jfs2), dio (jfs) OR set in Oracle filesystemio_options=setall - never do both. If you have an older oracle version installed, it might not be able to handle the setall option - in that case choose async_io and go with cio or dio for the filesystems. You might want to consider smaller blocksizes for the redo logs too.

I would like to see the

vmstat -Iwt 2 30 ; vmstat -v ; vmstat -s

outputs from a busy time too - and I would like to know if you have jfs or jfs2 filesystems and if you have applied at least some basic tuning ?

Interesting would be what your database is doing - trading, reporting, both ... how many connections, these things. Keep in mind every connection to oracle takes memory - in certain cases dozens of MB ...

Kind regards
zxmaus

zaxxon · August 12, 2010, 5:33am

No problem at all, I am fallible I did not notice any problem from having set both though. Did you encounter/read about any troubles? I have some older guides that do not exclude each other way explicitly.

As you said and I read in the following link it is not needed anymore for Oracle >= v10 to mount filesystem with cio anymore:
http://www.ibmsystemsmag.com/aix/octobernovember08/coverstory/21979p3.aspx

zxmaus · August 12, 2010, 9:55am

zaxxon,

we switch on cio or let oracle decide where to use cio because we want to avoid double buffering (what we have when both is on). If you switch off buffering completely (what you do with SETALL and cio both active) and you have a high transaction database, you are running the risk of saturating disks what used to happen in our company pretty frequently and slowed down the DBs more than not having anything activated.

But you are right - this is only true from Oracle 10g upwards - and we still do not know which DB version filosophizer is using

Kind regards
Nicki

zaxxon · August 12, 2010, 10:16am

No, it's good to know and I thank you for that hint. I will pass that info to our DB guys since they are currently doing performance tests comparing raw devices and file system data files. Maybe we can see a difference and that would be nice.

ross.mather · August 12, 2010, 10:41am

One place where the I/O of the Oracle database can be optimised is to have the filesystem use the same blocksize as the Oracle database. From memory this should be 1 MB rather than the more usual 4K.

filosophizer · August 18, 2010, 11:40am

AIX 5.2 doesn't have vmstat -w

not running query

# sar 1 5
 
AIX oradb 2 5 0059C87D4C00    08/18/10
 
System Configuration: lcpu=2
 
15:12:52    %usr    %sys    %wio   %idle
15:12:53      10       4       0      86
15:12:54       0       1       0      98
15:12:55       1       1       0      98
15:12:56       1       1       0      98
15:12:57      19       8       0      72
Average        6       3       0      90

but as soon as i run any query
idle becomes %20 ~ %30

# lsdev -C -l aio0
aio0 Available  Asynchronous I/O (Legacy)
# vmstat -w
vmstat: Not a recognized flag: w
Usage: vmstat [ -fsviItl ] [Drives] [ Interval [Count] ]
#
# vmstat -v
              1048576 memory pages
               977787 lruable pages
                 4815 free pages
                    1 memory pools
                94428 pinned pages
                 80,1 maxpin percentage
                 20,0 minperm percentage
                 80,0 maxperm percentage
                 59,9 numperm percentage
               585795 file pages
                  0,0 compressed percentage
                    0 compressed pages
                  0,0 numclient percentage
                 80,0 maxclient percentage
                    0 client pages
                    0 remote pageouts scheduled
                   85 pending disk I/Os blocked with no pbuf
                    0 paging space I/Os blocked with no psbuf
                55565 filesystem I/Os blocked with no fsbuf
                    0 client filesystem I/Os blocked with no fsbuf
                    0 external pager filesystem I/Os blocked with no fsbuf

The client is using JFS and *not* JFS2

The main issue:
Increasing the RAM from 4GB to 16GB will it increase the performance of the system or does the system need more processors ?

thanks

zxmaus · August 18, 2010, 8:09pm

Hi,

please run

vmstat -I 2 20 (uppercase i)

and YES you definitely need more memory ... you have only about 16 MB memory for your DB to work with, and I assume your freelist goes to 0 when you do queries - what equals to a halt on the system. Set your minperm to 3-5%, your maxperm to 90%. And add memory ! If you need more cpus depends on if your DB has more than 2 parallel queries at any given time - and what your queries are doing (like full tablescans).

Kind regards
zxmaus

filosophizer · August 19, 2010, 10:19am

ZXmaus, thanks for your reply. Could you please tell me, how you got the 16MB free memory for the Database ?

zxmaus · August 19, 2010, 9:08pm

Hi,

from your output above:
4815 free pages
each page is 4k ... ok ok 19 MB Still by far not enough for oracle to operate seamlessly. Once your free list goes to 0 what will very likely happen during backups or batches, the DB stops working until there are again a few pages free. Depending on the size and load on the database we try to keep the free list 6 digits if possible ... on high transactional huge databases even bigger - to have enough headroom to operate.

Kind regards
zxmaus

filosophizer · August 21, 2010, 4:04pm

Thanks for the reply ; zxmaus, what does vmstat -I 2 20 tell us

vmstat -I 2 20

# vmstat -I 2  2
System Configuration: lcpu=2 mem=4096MB
   kthr       memory              page               faults        cpu
-------- -----------  ------------------------ ------------ -----------
 r  b   p   avm   fre  fi  fo  pi  po   fr  sr   in   sy  cs us sy id wa
 1   2  0 802714  5120 394  34   7   8 409   101 448 3455 1338 10  3 82  5
 1  1  0 802802   4944  43   8   1   0    0    0 527 4472 1339 11  3 80  7
#

---------- Post updated at 12:04 PM ---------- Previous update was at 02:45 AM ----------

It is very confusing to decode / translate VMSTAT output. Searching on Google for VMSTAT translation I came across this link
http://www.skywayradio.com/tech/linux/vmstat.html
http://www.ibmsystemsmag.com/aix/februarymarch04/features/6670p1.aspx

http://www.aixexpert.com/wiki/index.php/AIX_Expert

From the AIX 5.1 "man" pages, the vmstat command reports statistics about kernel threads, virtual memory, disks, traps and CPU activity. These system-wide (among all processors) statistics are calculated as averages for values expressed as percentages and as sums otherwise. If the vmstat command is invoked without flags, the report contains a summary of activity since system startup. The interval parameter specifies the amount of time between each report in seconds. The first report contains statistics for the time since startup. Subsequent reports contain statistics collected during the interval since the previous report.

but now, I am more confused, can someone help me from the above output for vmstat -I 2 20 ; what does it show ?

How can I make sure that I need more memory or Need more processing power ? because someone told me, that ORACLE when it starts takes up all the available memory, so even if I have 100GB memory, it will always show little memory remaining. is this true ?

zxmaus · August 23, 2010, 1:15am

Hi,

please see this - I have added a few explanations ...

kthr: Information about kernel thread states.

r: Average number of runnable kernel threads over the sampling interval. Runnable threads consist of the threads that are ready but still waiting to run, and the threads that are already running. Number should not be higher than the number of lcpus in your system.

b: Average number of kernel threads placed in the Virtual Memory Manager (VMM) wait queue (awaiting resource, awaiting input/output) over the sampling interval. Any value here points to insufficient filesystem buffers, problems with IO subsystems or overall insufficient memory.

p: Number of threads waiting on I/O to raw devices per second. Only valid if you have rawdevices. Any number here points to problems with IO subsystem.

Memory: Information about the usage of virtual and real memory. Virtual pages are considered active if they have been accessed. A page is 4096 bytes.

avm: Active virtual pages. So called computational memory. In our environment, we have best performance if avm is around 70% on oracle systems, 80% on sybase systems. Ideally it does not exceed "100% - numperm%" after your system is sufficiently tuned.

fre: Size of the free list. The amount of really free memory in the box. That is - memory that is not used either computational or for filesystem caching. This number should be sufficiently high to accomodate the requested memory at any given time.

Note:
A large portion of real memory is utilized as a cache for file system data. It is not unusual for the size of the free list to remain small but it is vital that the free list NEVER drops to 0.

Page: Information about page faults and paging activity. These are averaged over the interval and given in units per second.

fi: File page-ins per second. These are e.g. all your disk reads.

fo: File page-outs per second. These are e.g. all your disk writes.

pi: Pages paged in from paging space. BAD as disk space is obviously slower than memory. Usually points to bad system tuning or insufficient memory.

po: Pages paged out to paging space. BAD as disk space is obviously slower than memory. Usually points to bad system tuning or insufficient memory.

fr: Pages freed up by page replacement and made available in the free list.

sr: Pages scanned by page-replacement algorithm to determine if they can be freed up.

Note: it is not important if these numbers are high or low. Important is the ratio of sr:fr - you have better performance the lower this number is - ideally not exceeding 1:2 - if you are higher than 1:8 you usually are in big trouble.

cy: Clock cycles by page-replacement algorithm. VMM uses a clock-algorithm to implement a least recently used (lru) page replacement scheme. Pages are aged by being examined by the clock.

Faults: Trap and interrupt rate averages per second over the sampling interval.

in: Device interrupts. Usually IO counter. This number usually presents the number of pages which will be adressed in the next cycle by the free list. If the free list is too small to adress these needs, memory needs to be scanned and freed for the next IO to occur.

sy: System calls. The 'work' your system is doing.

cs: Kernel thread context switches. Number of times your kernel is starting to do something else.

CPU: Breakdown of percentage usage of processor time.

us: User time. Amount of real work done by your cpu for applications - like your DB

sy: System time. So called kernel cpu usage. High values point to a lot of overhead of some kind. If you are memory constrained, cpu cycles will be used to free up memory. On sybase systems high values may represent cpu spinning - check if you have the correct number of engines for your DB.

id: Processor idle time. CPU cycles where your system is doing really nothing.

wa: Processor idle time during which the system had outstanding I/O requests. Usually a bad sign. Check if you can implement async IO which allows your cpu to process without waiting for IO to finish first. Might point to problems with your IO subsystem, insufficient buffer caches.

Your system - even though the AVM value is still in a rather acceptable area - seems as it would benefit from more memory - especially for filecaching and your free list. You should make sure that your system stops paging and you should try to reduce the amount of wait cpu.

If your box would be mine, I would implement below values and see if things improve. I would add as well at least 1-2 GB memory.

vmo -p -o minperm%=5
vmo -p -o maxperm%=90
vmo -p -o maxclient%=90
vmo -p -o minfree=1000
vmo -p -o maxfree=1200
vmo -p -o lru_file_repage=0
vmo -p -o lru_poll_interval=10
ioo -p -o hd_pbuf_cnt=1024
ioo -p -o numfsbufs=1024 ### can go up to 2048 if needed

I hope this helps,
kind regards
zxmaus

zaxxon · August 23, 2010, 5:10am

I am not sure if I am up to date (stuck in AIX 5.3 sadly), but I thought that hd_pbuf_cnt is kind of obsolete to the use of pv_min_pbuf.
Also numfsbufs I always thought is for JFS and for JFS2 you use j2_dynamicBufferPreallocation. If this is not sufficient it is recommended to also tune j2_nBufferPerPagerDevice in a second step.

bakunin · August 23, 2010, 8:08am

You probably shouldn't have asked Google, but unix.com (*snicker*):

http://www.unix.com/aix/99757-too-big-not-enough-memory-errors-shell-script.html\#post302285865

...and probably a gazillion of even more informative threads.

I hope this helps.

bakunin

PS: On a personal note I'd like to say for the record that i really appreciate zxmaus to be with us again. I hope everything is well and you have a good time.

zxmaus · August 25, 2010, 2:21pm

bakunin,

thank you - glad to be back

zaxxon

hd_pbuf_cnt=1024 = AIX 5.2 ,
pv_min_pbuf=1024 = AIX 5.3 and later

regarding the buffer tunables, this box seem to have enough jfs2 buffers but not enough buffers for its jfs filesystems, that is why I would suggest to only tune these ?

I am with you setting the other values would not harm the box - I just cannot see at the moment that it is needed. It might be later after initial tuning.

Just for completeness, if this would be my box I would most likely go with

ioo -p -o j2_maxPageReadAhead=128
ioo -p -o maxpgahead=16
ioo -p -o j2_maxRandomWrite=32
ioo -p -o maxrandwrt=32

on top of the memory tunables from the previous thread - but these are settings you should be rather careful with as they can potentially do more harm than good.

Kind regards
zxmaus

zaxxon · August 26, 2010, 1:09am

Ah he also wrote it

Never mind

---------- Post updated at 07:09 AM ---------- Previous update was at 06:39 AM ----------

I checked last days all of my documents I gathered and notes I wrote up and tried to get a clue.
I found this table (Table 2.) some minutes ago which sums it up quite well and very clearly - maybe there is someone else who is/was confused like me (ok, it's just me :D)

Optimizing AIX 5L performance: Tuning disk performance, Part 3

filosophizer · November 9, 2010, 4:28pm

This is another SYSTEM, it is IBM P 550 with 6 Processors and with Memory 96GB / 32GB Paging Space. 1000 users connecting concurrently to the Oracle Database

This is one of the best topics and most useful one as many companies buy AIX machines solely for Oracle Database.

The machine is still running slow. How can one trace and fine tune ?

 Disk    Busy%     KBPS     TPS KB-Read KB-Writ  Steals     3458  % Comp     90.0
  hdisk1   98.5     1.6K  333.5     1.2K  446.0   PgspIn      307  % Noncomp   9.9
  hdisk2   88.5    10.8K  757.5     3.1K    7.7K  PgspOut     111  % Client    9.9
  hdisk3    5.5   412.0    16.0     4.0   408.0   PageIn     1095
  hdisk4    0.0     0.0     0.0     0.0     0.0   PageOut    2179  PAGING SPACE
                                                  Sios       3271  Size,MB   33408
  Name            PID  CPU%  PgSp Owner                            % Used     35.5
  oracle      3141878  14.2  19.4 oraprod         NFS (calls/sec)  % Free     65.5
  oracle      3313978   5.9  34.9 oraprod         ServerV2       0
   
   
   
   
   
  Paging space is 35 .5 % 
   
  Hdisk 1 (which is internal server hard disk  not storage )is used with 98 %
   
  Hdisk2 is storage disk (which had DB )