CPU performance

In my oracle db server we have 15 cores (power8). The output of the vmstat is as below.

System configuration: lcpu=128 mem=208800MB ent=16.00

   kthr            memory                         page                       faults                 cpu             time
----------- --------------------- ------------------------------------ ------------------ ----------------------- --------
  r   b   p        avm        fre    fi    fo    pi    po    fr     sr    in     sy    cs us sy id wa    pc    ec hr mi se
 31  26   0   21663391      51253 129782  5225     0     0 110978 318023 41193 302797 156113 43 20 33  5 14.76  92.3 11:05:36
 28  39   0   21674139      46016 129213 15721     0     0 134097 188404 42576 319091 172279 42 20 32  5 14.54  90.9 11:05:37
 34  36   0   21680968      46409 130385 13285     0     0 136618 141490 42035 385893 163647 45 20 30  5 14.93  93.3 11:05:38
 34  39   0   21669473      51955 115124 12338     0     0 107550 114801 38514 366075 154055 45 19 31  5 14.94  93.3 11:05:39
  0   0   0   21675046      50088 116082 14413     0     0 119399 359118 40334 429664 171751 43 21 30  6 14.64  91.5 11:05:40
 40  36   0   21660587      51752 137059  9433     0     0 123435 280612 42885 406191 176519 42 21 31  6 14.57  91.1 11:05:41
 40  28   0   21672996      47765 132584  1542     0     0 140214 276680 47654 409385 165033 42 21 31  5 14.79  92.4 11:05:42
 26  24   0   21692747      48527 124613  5004     0     0 144966 404145 45226 399544 163073 41 21 32  5 14.74  92.1 11:05:43
 30  29   0   21686313      45561 130212  3960     0     0 122430 127164 39446 371176 177801 43 21 31  5 14.69  91.8 11:05:44
 32  28   0   21668455      50598 137069  1746     0     0 121488 127432 46515 366503 174261 43 20 32  5 14.71  91.9 11:05:45
 26  33   0   21673035      50625 114717 10553     0     0 118945 380090 43050 303303 147158 42 19 34  5 14.61  91.3 11:05:46
 34  33   0   21695594      48900 115034  8768     0     0 135057 145302 41228 336146 149403 43 19 33  5 14.79  92.4 11:05:47
 25  33   0   21692935      50267 107122  6226     0     0 105233 190084 35381 361517 155287 46 18 31  4 15.07  94.2 11:05:48
 32  33   0   21686530      54135 100484  7210     0     0 98634 415431 35097 388896 162992 45 20 30  5 14.96  93.5 11:05:49
 31  32   0   21691954      47633 92779 13739     0     0 91240 422302 34362 343061 151114 45 20 31  5 14.88  93.0 11:05:50
 32  24   0   21700998      47232 94516 14072     0     0 102629 188748 36481 501056 132911 45 20 30  5 14.92  93.3 11:05:51

Now you can see the run queue and wait queue both are high also entitled capacity is also always 90%. The 15 to 25% cpu is always idle. So is there a cpu bottleneck on this system or its ok, no one complaining but want to know for myself.

You have no CPU shortage at all, but the server is heavily misconfigured: it is one small step away from swapping to death. The high numbers in "fr" and "sr" are signs that memory is on the brink of being exhausted and the system is already scanning frantically for places which can be swapped out in case. The chock full blocked-queue and the wait% in the CPU section come from the system having to wait for I/O. This would be OK if the presented snapshot is from a backup cycle (where only I/O counts and the system is normally bound by that) but if this is the usual state of affairs the system would greatly profit from more I/O-capacity (like a better network connection, faster disks, etc. - where exactly the bottleneck in I/O is doesn't show up in the picture).

I don't know for sure but if this is an Oracle system you most probably have the SGA configured too big. Reduce it in size (or add memory, with the same effect) and add I/O capacity and you probably can take away ~3-4 processors without the performance being altered at all, perhaps even better.

You might want to read the Performance Tuning Introduction i wrote for an in-depth explanation of what is going on in your system.

I hope this helps.

bakunin

2 Likes

SGA is 40Gb only and still huge amount of memory is utilised. Also run and block thread total is grater than the number of cores so how you have suggested the cpu can be freed. The disk used are flash disk with 8gb fc ports. its a flash storage so should not face io issue.
The snap shot provided is a regular work and not from backup time.

I see that. I just can't see by which process, hence the assumption. If you want to know which process(es) is/are responsible use the ps -command.

The ony thing that matters is the number of logical CPUs, which is 128. I guess that if you look closely at these you will find that some of them are unused.

But since you mentioned that the SGA is only 40GB out of 200GB memory what else is running on this server? And wouldn't it be better to have, whatever it is, running in a separate LPAR so that the DB is not in danger if something goes wrong?

OK, but i said "disk or network". Where the I/O problem is is ot showing up in the vmstat -output, just the fact that there is one somewhere. You will have to use iostat and netstat to find out where exactly this is.

Understand, that we cannot do your work for you and we see only what you show us. I don't know what your system is doing all day long and i canonly base my assumptions on what i see. If you want more output from us give us more input. Furthermore, all i have written yet you could have gotten from the link i gave you. So you might want to make use of the help you get.

I hope this helps.

bakunin

3 Likes

in iostat avg queue coloumn max time is 0.5 that is also not continuous.
Also in netstat there are no crc errors please tell me what exactly i can check in that. When checked with
< ps aux | head -1; ps aux | sort -rnk 6 | more > all are oracle processees and nothing else is running on the system. that is why am asking for help.

As i have already said: DESCRIBE YOUR SYSTEM!

Right now i have seen: 200G memory, 16 processors, running Oracle.

What i do not know (the list is not complete):

  • which SAN / which multipath software?
  • which OS level?
  • which software? (that includes specifications, versions, etc.)
  • what is the underlying managed system?
  • how are your volume groups organised?
  • network connections?
  • is the system a cluster or not? (If yes, which one? HACMP? ORACLE RAC? else?)

I am at a loss what you mean. Try iostat 1 and write that to a file for some time (say: one minute or so). You will have to filter out the lines with real devices because multipathing software (if you use such a thing) creates pseudo-hdisk-devices that show up in iostat but are meaningless. For instance, with EMC PowerPath software you have one hdiskXX for every path and a hdiskpowerXX-device which is the real LUN. You have to watch only this.

Also in netstat there are no crc errors please tell me what exactly i can check in that.

Well, CRC errors are your least concern, i'd say. More realistic scenarios are: misconfigured DNS servers so that name resolution takes long, the software running on externally authenticated user accounts (Kerberos, ...) which take a long time to authenticate and similar things.

I am trying to help you. I just can't if you don't give me some info. performance tuning is like tuning a car: i am trying to explain it to you, but so far i know it has so many hp power, but not even the model! And telling me something like "when i step on the accelerator pedal it makes wwrrrrrmmmm" doesn't help me either.

Your ps-commands won't show you anything because AIX doesn't work like BSD. Instead, do the following:

First, look for shared memory that may be allocated. Use ipcs -m and have a look there. If you want to have it analyzed by us: post it complete.

Next, have a look where all your memory is spent. Try ps -Alo pid,vsz,args (or variations, have a look at the man page of ps for details) to see all the processes with their allocated memory. Note that the memory unit used here is pages (1 page = 4096 bytes), not bytes.

At last have a look at how the system is tuned: issue vmo -a and look at the "minperm" and "maxperm" lines. Issue vmstat -vs and look at the lines with "fs i/O with no pbuf" (or something such, i have no system at hand to look it up) and if there are high numbers there.

So, this is for a start. There might be more such requests, because performance tuning can be complicated. In every case: i suggest you look the commands up in the man pages and try to understand what is done with them and what the reason is for them being used. Because in the end the person most qualified to tune your system is: you! You know the system best and you sit in front of it.

I hope this helps.

bakunin

I really appriciate your help.

The system is Power 880. Its a vio client. The npiv is configured.
Network cards are FCoE Adapter on vios sharing one port to only 2 lpars.
The vios level is 2.4.2.20 and the DB oslevel is 7100-03-04.
The software is oracle 11G
The storage is assigned via IBM SVC cluster and IBM fs800 is the storage. I have only this much info anything else i need to ask storage admin.
The IBM sddpcm drivers are installed and each storage data volume has 4 active paths.
The OS volume has 2 different fcs paths.
There is no cluster for this server.

As this is a IBM storage and device drivers installed there are no pseudo disks.
The sample output of iostat as below.

Disks:                           xfers                                read                                write                                  queue                    time
-------------------- -------------------------------- ------------------------------------ ------------------------------------ -------------------------------------- ---------
                       %tm    bps   tps  bread  bwrtn   rps    avg    min    max time fail   wps    avg    min    max time fail    avg    min    max  avg   avg  serv
                       act                                    serv   serv   serv outs              serv   serv   serv outs        time   time   time  wqsz  sqsz qfull
hdisk11                2.0   8.7M  43.0   8.7M   0.0  182.0   1.5    0.2    5.4     0    0   0.0    0.0    0.0    0.0     0    0   0.0    0.0    0.1   0.0   0.0  0.0  08:00:01
hdisk5                 0.0   0.0    0.0   0.0    0.0    0.0   0.0    0.0    0.0     0    0   0.0    0.0    0.0    0.0     0    0   0.0    0.0    0.0   0.0   0.0  0.0  08:00:01
hdisk8                 0.0   0.0    0.0   0.0    0.0    0.0   0.0    0.0    0.0     0    0   0.0    0.0    0.0    0.0     0    0   0.0    0.0    0.0   0.0   0.0  0.0  08:00:01
hdisk15                3.0   2.3M  13.0   2.3M   0.0   45.0   2.1    1.0    8.1     0    0   0.0    0.0    0.0    0.0     0    0   0.0    0.0    0.0   0.0   0.0  0.0  08:00:01
hdisk19                1.0   2.7M  16.0   2.7M   0.0   48.0   1.4    0.6    3.6     0    0   0.0    0.0    0.0    0.0     0    0   0.0    0.0    0.0   0.0   0.0  0.0  08:00:01
hdisk9                 2.0   2.9M  14.0   2.9M   0.0   72.0   2.0    0.7    4.9     0    0   0.0    0.0    0.0    0.0     0    0   0.0    0.0    0.1   0.0   0.0  0.0  08:00:01
hdisk26                0.0   0.0    0.0   0.0    0.0    0.0   0.0    0.0    0.0     0    0   0.0    0.0    0.0    0.0     0    0   0.0    0.0    0.0   0.0   0.0  0.0  08:00:01
hdisk17                1.0   2.1M  13.0   2.1M   0.0   39.0   1.7    1.0    6.4     0    0   0.0    0.0    0.0    0.0     0    0   0.0    0.0    0.0   0.0   0.0  0.0  08:00:01
hdisk13                1.0   2.2M  13.0   2.2M   0.0   53.0   1.4    0.3    3.5     0    0   0.0    0.0    0.0    0.0     0    0   0.0    0.0    0.0   0.0   0.0  0.0  08:00:01
hdisk28                0.0   4.1K   1.0   4.1K   0.0    2.0   0.6    0.6    0.7     0    0   0.0    0.0    0.0    0.0     0    0   0.0    0.0    0.0   0.0   0.0  0.0  08:00:01

===============================================================================================================================================

The hosts entries are managed by /etc/hosts file only and no dns server is configured. The authentication is only for local accounts so no external authentication.

The ipcs output as below

T        ID     KEY        MODE       OWNER    GROUP
Shared Memory:
m   1048576 0x01002357 --rw-------     root   system
m   2097153 0x6100e09b --rw------- pconsole   system
m   1048578 0xffffffff D-rw------- pconsole   system
m         3   00000000 --rw-r-----   oracle      dba
m         4   00000000 --rw-r-----   oracle      dba
m         5 0x5ad596a4 --rw-r-----   oracle      dba
m         6   00000000 --rw-r-----   oracle      dba
m         7   00000000 --rw-r-----   oracle      dba
m         8 0x0bf59740 --rw-r-----   oracle      dba
m 156237834 0x0d033d30 --rw-rw----     root   system
m 206569483 0xffffffff D-rw-------   oracle      dba
m  71303180 0xffffffff D-rw-------   oracle      dba
m  23068685 0xffffffff D-rw-------   oracle      dba
m  17825806 0xffffffff D-rw-------   oracle      dba
m  17825807 0xffffffff D-rw-------   oracle      dba
m  17825808 0xffffffff D-rw-------   oracle      dba
m 351272981 0xffffffff D-rw-------    oemdb      dba 

---------------------------------------------------------------

The output of ps top 20

38143450 281392 /oemdata/agent_software/core/1.10.0/oracle_common/jdk/bin/java -Xmx172M -server
32245454 237240 /oemdata/agent12c/core/12.1.0.4.0/jdk/bin/java -Xmx318M -server 
10617204 195412 /usr/sbin/rsct/bin/IBM.HostRMd
 8061550 125000 ora_arc2_fininddb
 9437556 122440 ora_arcc_fininddb
 9568356 105096 ora_arc7_fininddb
11600338 104520 ora_arc6_fininddb
10879398 99016 ora_arc9_fininddb
18546830 93000 ora_arce_fininddb
 4981288 92680 ora_arc0_fininddb
18022482 91464 ora_arc4_fininddb
 8847704 86728 ora_arcg_fininddb
11075588 86664 ora_arc8_fininddb
 3605070 86216 ora_arcb_fininddb
12845242 86088 ora_arca_fininddb
 7537316 85192 ora_arci_fininddb
 7209628 84552 ora_arc5_fininddb
 4129490 84232 ora_arcj_fininddb
 7864702 80840 ora_arch_fininddb 

The minperm and maxperm is

minperm = 1526905
maxperm = 45807403
minperm% = 3
maxpin% = 90
131148847278 pending I/O waits
643098287800 start I/Os
185464 pending disk I/Os blocked with no pbuf
0 paging space I/Os blocked with no psbuf
2228 filesystem I/Os blocked with no fsbuf
73 client filesystem I/Os blocked with no fsbuf
12421 external pager filesystem I/Os blocked with no fsbuf

Also the tuning parameters are provided by IBM.

A couple of observations to add to this thread:

  • In post #2 a high %iowait was implied, but it is only 5% in the vmstat output.
  • Active memory use avm (without filesystem caching / buffers) is 80GiB. So this is mainly caused by the database and the application combined.
  • fr:sr is only 1:2.2 so the system appears to have no problem freeing the memory that it requires.
  • fi+fo>fre , and fi is roughly equal to fr , so it appears to be mainly caused by filesystem I/O that needs to be placed somewhere in memory.
  • Perhaps we are looking at filesystem cache which has not been curtailed (left at unlimited setting), is what is causing this. Typically with cooked filesystems, filesystem buffers should be kept in check (only enough for I/O optimization) as the database typically is better at handling the caching / buffering. Perhaps it is the application that is causing this...
  • The system is reading at 0.5 GiB/s. So - as Bakunin already remarked -if there was no backup or similar activity at the moment the vmstat was produced and if this can be ascribed to database activity, then this would probably be due to sequential read I/O. If this is not expected behaviour, then perhaps the database or the application should be tuned. If that is the case, then I would be inclined to expect the SGA would need to be increased, rather than decreased. But this is speculation of course until this is properly researched. The I/O may also be network I/O caused by the application for example..
  • The output of iostat does not show high numbers, so maybe we are looking at network I/O?
  • The ps output, you left out the labels, so it is not clear what the numbers mean and also what "top 20" entails
1 Like

Hello,

There is not backup process. The network output and ps output as below.

 root@ABRO-DB/#ps -Alo pid,vsz,args | sort -rnk 2 |head -30
64161024 1400968 oraclefinabrodb (LOCAL=NO)
38143450 292976 /oemdata/agent_software/core/1.10.0/oracle_common/jdk/bin/java -Xmx172M -server -Djava.security.egd=file:///dev/./urandom -Dsun.lang.ClassLoader.allowArraySyntax=true -Xgcpolicy:gencon -Dhttps.protocols=TLS -Dwatchdog.pid=34866428 -cp /oemdata/agent_software/core/1.10.0/jdbc/lib/ojdbc7.jar:/oemdata/agent_software/core/1.10.0/ucp/lib/ucp.jar:/oemdata/agent_software/core/1.10.0/oracle_common/modules/jsch-0.1.51.jar:/oemdata/agent_software/core/1.10.0/oracle_common/modules/com.oracle.http_client.http_client_12.1.3.jar:/oemdata/agent_software/core/1.10.0/oracle_common/modules/oracle.xdk_12.1.3/xmlparserv2.jar:/oemdata/agent_software/core/1.10.0/oracle_common/modules/oracle.dms_12.1.3/dms.jar:/oemdata/agent_software/core/1.10.0/oracle_common/modules/oracle.odl_12.1.3/ojdl.jar:/oemdata/agent_software/core/1.10.0oracle_common/modules/oracle.odl_12.1.3/ojdl2.jar:/oemdata/agent_software/core/1.10.0/lib/optic.jar:/oemdata/agent_software/core/1.10.0/sysman/jlib/log4j-core.jar:/oemdata/agent_software/core/1.10.0/jlib/gcagent_core.jar:/oemdata/agent_software/core/1.10.0/sysman/jlib/emagentSDK-intg.jar:/oemdata/agent_software/core/1.10.0/sysman/jlib/emagentSDK.jar oracle.sysman.gcagent.tmmain.TMMain
32245454 237240 /oemdata/agent12c/core/12.1.0.4.0/jdk/bin/java -Xmx318M -server -Djava.security.egd=file:///dev/./urandom -Dsun.lang.ClassLoader.allowArraySyntax=true -Xgcpolicy:gencon -Dwatchdog.pid=1246384 -cp /oemdata/agent12c/core/12.1.0.4.0/jdbc/lib/ojdbc5.jar:/oemdata/agent12c/core/12.1.0.4.0/ucp/lib/ucp.jar:/oemdata/agent12c/core/12.1.0.4.0/modules/oracle.http_client_11.1.1.jar:/oemdata/agent12c/core/12.1.0.4.0/lib/xmlparserv2.jar:/oemdata/agent12c/core/12.1.0.4.0/lib/jsch.jar:/oemdata/agent12c/core/12.1.0.4.0/lib/optic.jar:/oemdata/agent12c/core/12.1.0.4.0/modules/oracle.dms_11.1.1/dms.jar:/oemdata/agent12c/core/12.1.0.4.0/modules/oracle.odl_11.1.1/ojdl.jar:/oemdata/agent12c/core/12.1.0.4.0/modules/oracle.odl_11.1.1/ojdl2.jar:/oemdata/agent12c/core/12.1.0.4.0/sysman/jlib/log4j-core.jar:/oemdata/agent12c/core/12.1.0.4.0/jlib/gcagent_core.jar:/oemdata/agent12c/core/12.1.0.4.0/sysman/jlib/emagentSDK-intg.jar:/oemdata/agent12c/core/12.1.0.4.0/sysman/jlib/emagentSDK.jar oracle.sysman.gcagent.tmmain.TMMain
10617204 199992 /usr/sbin/rsct/bin/IBM.HostRMd
13960796 128108 /usr/bin/topas_nmon  -f -t -d -A -O -L -N -P -V -T -^ -s 30 -c 720 -m /perfdata/nmon -youtput_dir=/perfdata/nmon/CORE-DB -ystart_time=06:00:01,Jan05,2017
11075996 123400 ora_arcd_finabrodb
 3605070 121672 ora_arcb_finabrodb
12845242 111432 ora_arca_finabrodb
 7864702 105544 ora_arch_finabrodb
11600338 104776 ora_arc6_finabrodb
11075588 103816 ora_arc8_finabrodb
 4981288 99080 ora_arc0_finabrodb
10879398 98952 ora_arc9_finabrodb
 9437556 98760 ora_arcc_finabrodb
 7537316 89096 ora_arci_finabrodb
18022482 88648 ora_arc4_finabrodb
 8847704 86664 ora_arcg_finabrodb
14155902 86280 ora_arcf_finabrodb
 4129490 85832 ora_arcj_finabrodb
 9568356 84040 ora_arc7_finabrodb
 7209628 83464 ora_arc5_finabrodb
 8061550 77768 ora_arc2_finabrodb
 3277356 74176 /usr/java6/bin/java -Xmx512m -Xms20m -Xscmx10m -Xshareclasses -Xbootclasspath/p:/usr/java5/jre/lib/ibmjsseprovider2.jar -Dfile.encoding=UTF-8 -Xbootclasspath/p:/usr/java5/jre/lib/ibmjsseprovider2.jar -Xbootclasspath/a:/pconsole/lwi/runtime/core/eclipse/plugins/com.ibm.rcp.base_6.2.1.20091117-1800/rcpbootcp.jar:/pconsole/lwi/lib/ISCJaasModule.jar:/pconsole/lwi/lib/com.ibm.logging.icl_1.1.1.jar:/pconsole/lwi/lib/jaas2zos.jar:pconsole/lwi/lib/jaasmodule.jar:/pconsole/lwi/lib/lwinative.jar:/pconsole/lwi/lib/lwinl.jar:/pconsole/lwi/lib/lwirolemap.jar:/pconsole/lwi/lib/lwisecurity.jar:/pconsole/lwi/lib/lwitools.jar:/pconsole/lwi/lib/passutils.jar -Xverify:none -cp eclipse/launch.jar:eclipse/startup.jar:/pconsole/lwi/runtime/core/eclipse/plugins/com.ibm.rcp.base_6.2.1.20091117-1800/launcher.jar com.ibm.lwi.LaunchLWI
18546830 67848 ora_arce_finabrodb
37094778 59784 oraclefinabrodb (LOCAL=NO)
 6226442 56136 ora_arc1_finabrodb
52233484 55112 ora_nsa2_finabrodb
12189796 46984 ora_arc3_finabrodb
 7864570 44268 /usr/es/sbin/cluster/clstrmgr
 3080534 43648 j2pg
 

==================================================
The network card stats

root@ABRO-DB/#netstat -v ent0
-------------------------------------------------------------
ETHERNET STATISTICS (ent0) :
Device Type: Virtual I/O Ethernet Adapter (l-lan)
Hardware Address: b2:51:b4:3b:20:0b
Elapsed Time: 88 days 4 hours 58 minutes 42 seconds

Transmit Statistics:                          Receive Statistics:
--------------------                          -------------------
Packets: 127202332052                         Packets: 98167235979
Bytes: 99027328638719                         Bytes: 46966446811491
Interrupts: 0                                 Interrupts: 51688301615
Transmit Errors: 0                            Receive Errors: 0
Packets Dropped: 0                            Packets Dropped: 0
                                              Bad Packets: 0
Max Packets on S/W Transmit Queue: 0
S/W Transmit Queue Overflow: 0
Current S/W+H/W Transmit Queue Length: 0

Broadcast Packets: 18288                      Broadcast Packets: 92360313
Multicast Packets: 4                          Multicast Packets: 0
No Carrier Sense: 0                           CRC Errors: 0
DMA Underrun: 0                               DMA Overrun: 0
Lost CTS Errors: 0                            Alignment Errors: 0
Max Collision Errors: 0                       No Resource Errors: 0
Late Collision Errors: 0                      Receive Collision Errors: 0
Deferred: 0                                   Packet Too Short Errors: 0
SQE Test: 0                                   Packet Too Long Errors: 0
Timeout Errors: 0                             Packets Discarded by Adapter: 0
Single Collision Count: 0                     Receiver Start Count: 0
Multiple Collision Count: 0
Current HW Transmit Queue Length: 0

General Statistics:
-------------------
No mbuf Errors: 0
Adapter Reset Count: 0
Adapter Data Rate: 20000
Driver Flags: Up Broadcast Running
        Simplex 64BitSupport ChecksumOffload
        DataRateSet VIOENT

Virtual I/O Ethernet Adapter (l-lan) Specific Statistics:
---------------------------------------------------------
RQ Length: 9025
Trunk Adapter: False
Filter MCast Mode: False
Filters: 255
  Enabled: 2  Queued: 0  Overflow: 0
LAN State: Operational

Hypervisor Send Failures: 0
  Receiver Failures: 0
  Send Errors: 0
Hypervisor Receive Failures: 0

Invalid VLAN ID Packets: 0

ILLAN Attributes: 000000000000B002 [000000000000A000]

Port VLAN ID:   140
VLAN Tag IDs:  None


Switch ID: ETHERNET0

Hypervisor Information
  Virtual Memory
    Total (KB)                150
  I/O Memory
    VRM Minimum (KB)          172
    VRM Desired (KB)          172
    DMA Max Min (KB)          256

Transmit Information
  Transmit Buffers
    Buffer Size             65536
    Buffers                    32
    History
      No Buffers                7
  Virtual Memory
    Total (KB)               2048
  I/O Memory
    VRM Minimum (KB)         2176
    VRM Desired (KB)        16384
    DMA Max Min (KB)        16384

Receive Information
  Receive Buffers
    Buffer Type              Tiny    Small   Medium    Large     Huge
    Min Buffers              2048     2048      256       64       64
    Max Buffers              4096     4096      512      128      128
    Allocated                2048     2063      271       64       64
    Registered               2047     2040      254       64       64
    History
      Max Allocated          2048     2280      362       74       64
      Lowest Registered      2046     2033      251       59       64
  Virtual Memory
    Minimum (KB)             1024     4096     4096     2048 4096
    Maximum (KB)             2048     8192     8192     4096     8192
  I/O Memory
    VRM Minimum (KB)        16384    16384     5120     2304     4352
    VRM Desired (KB)        32768    32768    10240     4608     8704
    DMA Max Min (KB)        32768    32768    16384     8192    16384

I/O Memory Information
  Total VRM Minimum (KB)    46892
  Total VRM Desired (KB)   105644
  Total DMA Max Min (KB)   123136

The system settings seem in shape so far.

You might also want to check with a DBA, if there are locks blocking requests that could cause the high values in the b column. I am not sure about that, just a guess.

Also could you please post the complete output of the following:

vmstat -v
iostat -A 1 10 | grep -p ^aio
lsfs

dumb question. do you have

FILESYSTEMIO_OPTIONS=SETALL

in your init.ora?

and for reference a little bit old, but still useful paper from IBM - Oracle Architecture and Tuning on AIX

Did you find out anything? Would be interesting.

root@ABRO-DB/#vmstat -v
52428800 memory pages
             50897120 lruable pages
                52390 free pages
                   17 memory pools
              7087860 pinned pages
                 90.0 maxpin percentage
                  3.0 minperm percentage
                 90.0 maxperm percentage
                 60.1 numperm percentage
             30613697 file pages
                  0.0 compressed percentage
                    0 compressed pages
                 60.1 numclient percentage
                 90.0 maxclient percentage
             30613697 client pages
                    0 remote pageouts scheduled
               200985 pending disk I/Os blocked with no pbuf
                    0 paging space I/Os blocked with no psbuf
                 2228 filesystem I/Os blocked with no fsbuf
                   73 client filesystem I/Os blocked with no fsbuf
                18038 external pager filesystem I/Os blocked with no fsbuf
                 41.5 percentage of memory used for computational pages

====================================================

root@ABRO-DB/#iostat -A 1 10 | grep -p ^aio
aio: avgc avfc maxgc maxfc maxreqs avg-cpu: % user % sys % idle % iowait physc % entc
     41871.0  0.0   232     0    131072            43.6  16.0   36.4      4.0  14.2   88.7

aio: avgc avfc maxgc maxfc maxreqs avg-cpu: % user % sys % idle % iowait physc % entc
     39126.0  0.0   192     0    131072            44.5  14.4   37.9      3.2  14.4   89.7

aio: avgc avfc maxgc maxfc maxreqs avg-cpu: % user % sys % idle % iowait physc % entc
     30756.0  0.0   195     0    131072            45.2  14.0   37.9      2.9  14.7   91.7

aio: avgc avfc maxgc maxfc maxreqs avg-cpu: % user % sys % idle % iowait physc % entc
     27308.0  0.0   243     0    131072            48.4  13.3   34.8      3.5  15.0   93.8

aio: avgc avfc maxgc maxfc maxreqs avg-cpu: % user % sys % idle % iowait physc % entc
     38195.0  0.0   204     0    131072            46.0  14.9   35.8      3.3  14.8   92.7

aio: avgc avfc maxgc maxfc maxreqs avg-cpu: % user % sys % idle % iowait physc % entc
     43712.0  0.0   213     0    131072            45.1  15.8   35.3      3.9  14.8   92.3

aio: avgc avfc maxgc maxfc maxreqs avg-cpu: % user % sys % idle % iowait physc % entc
     42478.0  0.0   212     0    131072            40.5  18.0   37.2      4.3  14.3   89.2

aio: avgc avfc maxgc maxfc maxreqs avg-cpu: % user % sys % idle % iowait physc % entc
     37878.0  0.0   434     0    131072            38.9  17.9   39.1      4.1  14.2   88.8

aio: avgc avfc maxgc maxfc maxreqs avg-cpu: % user % sys % idle % iowait physc % entc
     32627.0  0.0   252     0    131072            38.1  15.9   42.6      3.3  13.8   86.0

aio: avgc avfc maxgc maxfc maxreqs avg-cpu: % user % sys % idle % iowait physc % entc
     36750.0  0.0   229     0    131072            41.1  15.5   39.3      4.1  13.9   86.9

====================================================

Name            Nodename   Mount Pt               VFS   Size    Options    Auto Accounting
/dev/hd4        --         /                      jfs2  6291456 --         yes  no
/dev/hd1        --         /home                  jfs2  8388608 --         yes  no
/dev/hd2        --         /usr                   jfs2  12582912 --         yes  no
/dev/hd9var     --         /var                   jfs2  10485760 --         yes  no
/dev/hd3        --         /tmp                   jfs2  8388608 --         yes  no
/dev/hd11admin  --         /admin                 jfs2  2097152 --         yes  no
/proc           --         /proc                  procfs --      --         yes  no
/dev/hd10opt    --         /opt                   jfs2  4194304 --         yes  no
/dev/livedump   --         /var/adm/ras/livedump  jfs2  2097152 --         yes  no
/dev/roeodbkplv   --     /eodbackup               jfs2  731906048 rbrw,rw    yes  no
/dev/roauditlv    --     /audit                   jfs2  10485760 rw         yes  no
/dev/rodata13lv   --     /abrofindata13             jfs2  624951296 rw         no   no
/dev/rodata11lv   --     /abrofindata11             jfs2  624951296 rw         no   no
/dev/roindex14lv  --     /abrofinindex14            jfs2  415236096 rw         no   no
/dev/rodata17lv   --     /abrofindata17             jfs2  624951296 rw         no   no
/dev/rodata7lv    --     /abrofindata07             jfs2  624951296 rw         no   no
/dev/roindex8lv   --     /abrofinindex08            jfs2  415236096 rw         no   no
/dev/rodata12lv   --     /abrofindata12             jfs2  624951296 rw         no   no
/dev/rodata1lv    --     /abrofindata01             jfs2  1251999744 --         no   no
/dev/rodata5lv    --     /abrofindata05             jfs2  624951296 rw         no   no
/dev/rodata19lv   --     /abrofindata19             jfs2  624951296 rw         no   no
/dev/rodata18lv   --     /abrofindata18             jfs2  624951296 rw         no   no
/dev/rodata15lv   --     /abrofindata15             jfs2  624951296 rw         no   no
/dev/rodata20lv   --     /abrofindata20             jfs2  624951296 rw         no   no
/dev/rodata10lv   --     /abrofindata10             jfs2  624951296 rw         no   no
/dev/rooraclelv   --     /abrodboracle              jfs2  83886080 --         no   no
/dev/rodata6lv    --     /abrofindata06             jfs2  624951296 rw         no   no
/dev/roindex7lv   --     /abrofinindex07            jfs2  415236096 rw         no   no
/dev/roindex3lv   --     /abrofinindex03            jfs2  415236096 --         no   no
/dev/roindex4lv   --     /abrofinindex04            jfs2  415236096 rw         no   no
/dev/roarchlv     --     /abrofindata               jfs2  367001600 --         no   no
/dev/roindex9lv   --     /abrofinindex09            jfs2  415236096 rw         no   no
/dev/rooemdatalv  --     /abrooemdata               jfs2  41943040 rw         no   no
/dev/rodata14lv   --     /abrofindata14             jfs2  628621312 rw         no   no
/dev/roindex11lv  --     /abrofinindex11            jfs2  419168256 rw         no   no
/dev/roindex5lv   --     /abrofinindex05            jfs2  415236096 rw         no   no
/dev/roundolv     --     /abrofinundo               jfs2  230686720 --         no   no
/dev/rosortlv     --     /abrofinsort               jfs2  209715200 --         no   no
/dev/roindex12lv  --     /abrofinindex12            jfs2  419168256 rw         no   no
/dev/rodata9lv    --     /abrofindata09             jfs2  622854144 rw         no   no
/dev/rodata2lv    --     /abrofindata02             jfs2  624951296 --         no   no
/dev/roredo1lv    --     /abrofinredo01             jfs2  39845888 --         no   no
/dev/rointerlv    --     /abrointerfaces            jfs2  102760448 --         no   no
/dev/roindex10lv  --     /abrofinindex10            jfs2  419168256 rw         no   no
/dev/rodata8lv    --     /abrofindata08             jfs2  624951296 rw         no   no
/dev/roredo3lv    --     /abrofinredo03             jfs2  39845888 --         no   no
/dev/rodata3lv    --     /abrofindata03             jfs2  624951296 --         no   no
/dev/roindex1lv   --     /abrofinindex01            jfs2  415236096 --         no   no
/dev/roindex2lv   --     /abrofinindex02            jfs2  415236096 --         no   no
/dev/rodata16lv   --     /abrofindata16             jfs2  624951296 rw         no   no
/dev/roumplv  --         /abrooradump               jfs2  39845888 --         no   no
/dev/roredo2lv    --     /abrofinredo02             jfs2  39845888 --         no   no
/dev/rodata4lv    --     /abrofindata04             jfs2  624951296 rw         no   no
/dev/roindex6lv   --     /abrofinindex06            jfs2  415236096 rw         no   no
/dev/roindex13lv  --     /abrofinindex13            jfs2  415236096 rw         no   no
/dev/roarch1lv    --     /abrofinarch1              jfs2  2147483648 rbrw,rw    no   no

By the output of vmstat -v 's numclient it seems there would be enough cached files that could be discarded to get free ram.

The AIO requests do not hit the maxreqs so that's good too.

By the mount options it looks you are not using DIO (Direct IO) nor CIO (Concurrent IO) which is often used to get some performance increase.
We implemented CIO once and it was a huge increase on performance for the Ora DB as the filesystem buffers are not used anymore and Oracle handles it itself.

Implementing it I can not say if it changes to the many blocked threads in the RunQueue.
Though if there would be a need for better performance, this should be a way to go.
For implementing it see the appropriate documentations as it may only be used for some types of Oracle files ie. filesystems.

Another much way, which is a lot more work than just adding mount options and the setting agent.kgb mentioned is using the Oracle ASM where you supply raw devices and Oracle will handle them accordingly. Though I have no experience with it and I am not aware what is the current "state-of-the-art".
Here is a document about implementing ASM:
>>> IBM ASM Doc

For the short glimpse and not knowing if there is some workload peaks, the box seems to could have a bit less CPU and RAM.

Thanks zazzon and bakunin. The performance needs to be increased. As per the current outputs we can see that system is not streched but still memory is utilized. Currently the load is a serial load when tried to increase the load parallaly the system still takes the same time to complete, which we thought parallel should reduce the time.
And I need to check with the dba for cio or dio. I will revert soon.

Thanks again...

CIO also includes the benefits DIO - afaik with jfs2 you usually use CIO anyway, no DIO.

If I recall it correct, it is just mounting the relevant filesystems with the cio option and have the setting in the init.ora that agent.kgb described. Though take care that not all Oracle FS maybe used with CIO!

Taken from: Managing Oracle disk I/O in AIX

As mentioned by the other guys, you might have a check applicationwise, maybe some queries cost a lot of resources, that an index is not optimal or table scans are done etc.

Check this one, page 3:
IBM Systems Mag: In tune with Oracle

Sorry for not replying earlier, starting a new project kept me busy the lst few days.

There are a few things that don't quite add up IMHO:

First, the initial vmstat -output says ~200GB memory, but the avm column only shows ~21 mio of pages, which is ~80GB. Where is the difference? Please post the output of lsattr -El mem0 to verify how much (real) memory you really have.

Second, you said you have 15 cores configured, but the vmstat output shows 16. I presume that was just a typo on your part, but please confirm.

Third, further posted outputs of ps suggest that you have different DB instances running (fininddb and finabrodb). How many database instances are running simultaneously?

Fourth, i don't understand why there are so many archiver processes shown in the ps-outputs. What exactly is/are the DB(s) doing (in terms of how many requests and of which size typically) and how many logs (of which size) are typically produced per time unit? Are there any dumps being taken, exports running or the like?

In light of further information i am of the same opinion as Scrutinizer: your are perhaps a victim of double caching. The high number of pending I/Os and fs I/O blocked with no pbuf are further indicative of this assumption. If (see above, this is why this information is important) you have only one DB instance and you have 80GB of RAM and nothing else running on the system increase the SGA to ~60-70GB and see how that works. If you have set FILESYSTEMIO_OPTIONS=SETALL as suggested by agent.kgb Oracle should open its DB files with concurrent I/O even if the filesystem is not mounted with the CIO option. Concurrent I/O bypasses the OS caching of FS operations but i presume you haven't activated that yet otherwise the picture of two different caching systems blocking each other should not be seen even if the SGA is too small (as it probably is right now).

Finally, a suggestion: when you use vmstat on concurrent systems use the "-w" option. This way you get a neat table as output and it is easier to assess the picture.

I hope this helps.

bakunin

1 Like

The db server is of 16 cores that was a typo mistake. Its has only one db instance running which finabrodb and nothing else is running on this system. The other Db ps entry was also my mistake from UAT system sorry for that.

root@ABRO-DB/#lsattr -El mem0
ent_mem_cap           I/O memory entitlement in Kbytes           False
goodsize       204800 Amount of usable physical memory in Mbytes False
mem_exp_factor        Memory expansion factor                    False
size           204800 Total amount of physical memory in Mbytes  False
var_mem_weight        Variable memory capacity weight            False
root@ABRO-DB/#

For so many archive processes this need to check with dba.
the FILESYSTEMIO_OPTIONS=SETALL is not enabled. The SGA is 40Gb
I have already raise request to increase the SGA to 80Gb as memory is of 200Gb.

OK, but prepare to either shrink the LPARs memory or increase the SGA further to about 90% of the available RAM.

The most imminent thing to do, though, is to set the "FILESYSTEMIO_OPTIONS=SETALL" option or to mount the filesystems accordingly. I suggest doing it in the DB configuration: without this you will always have the OS trying to (double-) buffer the DB I/O, which will always be detrimental to performance.

Mounting an FS in CIO-mode (concurrent-I/O) will make all the I/O on that FS be via concurrent I/O, which is a good idea for DB files opened by the DB but a bad idea otherwise. You would need to create separate FSes (not to be mounted with the CIO-option) for i.e. dumps, exports, etc.. Setting the option in the Oracle-config will make the database open the individual files with the CIO-option but the FS itself is not changed. Thiw will have the same effect as above but without the need to create separated FSes.

I hope this helps.

bakunin