Your system has absolutely all available memory (including swap) exhausted. Which process is the culprit is everybodies guess. I suppose you have some memory-leak there.
The only option is: if you can, shut down Oracle, which will free some memory and then try to immediately power-cycle the system. If this is not possible your only option is to power-cycle it the hard way. AIX fileystems can usually cope with this and if you do it at a time where DB activity is minimal the risk for the database is small. A bit of redo-log gymnastics is usually all it takes to recover.
I remember having had such a problem once, which was a version incompatibility between some Oracle component and the AIX version (5.2 ML1, IIRC). After updating to a fitting set of versions the problem was gone never to come back.
/tmp if swap is full. If you keep adding processes or they keep mallocing vm, swao gets exhausted. It can be hard to pinpoint or cure. You can see vm size on 'ps-el'.
You could try also - but mail fail due to lack of paging space - to increase the size of hd6 with
chps -s 4 hd6 (to add 4 logical partitions to paging space).
The tuneables to look at - to get warnings ahead of time - are:
In particular, the npswarn parameter can be used to take corrective action before you get into the situation that you have less than npskill pages (4k Pages!, so <4MByte by default) of paging space.
michael@x054:[/etc/tunables]vmo -h npswarn
Help for tunable npswarn:
Purpose:
Specifies the number of free paging-space pages at which the operating system begins sending the SIGDANGER signal to processes.
Values:
Default: 4096
Range: 1 - 131071
Type: Dynamic
Unit: 4KB pages
Tuning:
The default value is the maximum of 512 and (4*npskill). The value of npswarn must be greater than zero and less than the total number of paging space pages on the system. Increase the value if you experience processes being killed because of low paging space.
Note: since your probable memory is running as root (nokilluid == 0) , it does not get killed automatically - even though the system is in npskill level.
What I used to have was a process running in the background that would wait to be woken by SIGDANGER and would activate a predefined, but inactive paging space - and start doing analysis while it was still possible.
In situations like this a command such as
svmon -P -t 10
can give some insight into what program/group of programs are having memory leak issues.
no, vm or avm is only addressable virtual memory. To really see paging space used you must use svmon . There is a column Pgsp
Further, ps counts everything as if it is unique - but many segments, especially shared memory code and data, as well as kernel.
In sample below see, amng others, that segments b000b and 20002 are everywhere. ps counts, i.e. reports them everytime as if they were only in their process.
michael@x054:[/home/michael]svmon -P -t 3 | grep -v clnt
-------------------------------------------------------------------------------
Pid Command Inuse Pin Pgsp Virtual 64-bit Mthrd 16MB
4391154 java 48179 8478 0 39022 N Y N
Vsid Esid Type Description PSize Inuse Pin Pgsp Virtual
830583 3 work working storage s 19320 0 0 19320
b000b d work shared library text s 10304 0 0 10304
20002 0 work kernel segment s 8816 8400 0 8816
8004e0 - work s 279 75 0 279
870547 f work working storage s 270 0 0 270
840564 2 work process private s 33 3 0 33
880568 c mmap maps 1 source(s) s 0 0 - -
8e056e b mmap maps 18 source(s) s 0 0 - -
-------------------------------------------------------------------------------
Pid Command Inuse Pin Pgsp Virtual 64-bit Mthrd 16MB
5963962 java 38511 8436 0 31575 N Y N
Vsid Esid Type Description PSize Inuse Pin Pgsp Virtual
b000b d work shared library text s 10304 0 0 10304
8906a9 3 work working storage s 10083 0 0 10083
20002 0 work kernel segment s 8816 8400 0 8816
9807b8 e work shared memory segment s 1977 0 0 1977
860686 f work working storage s 252 0 0 252
8c068c - work s 112 33 0 112
8606a6 2 work process private s 31 3 0 31
900750 b mmap maps 3 source(s) s 0 0 - -
940774 c mmap maps 1 source(s) s 0 0 - -
-------------------------------------------------------------------------------
Pid Command Inuse Pin Pgsp Virtual 64-bit Mthrd 16MB
4456598 cimserver 25141 8434 0 25127 N Y N
Vsid Esid Type Description PSize Inuse Pin Pgsp Virtual
b000b d work shared library text s 10304 0 0 10304
20002 0 work kernel segment s 8816 8400 0 8816
960356 3 work working storage s 2896 0 0 2896
930333 2 work process private s 2748 3 0 2748
860366 f work shared library data s 209 0 0 209
850385 - work s 154 31 0 154
9a035a 5 work working storage s 0 0 0 0
840364 a work working storage s 0 0 0 0
980358 4 work working storage s 0 0 0 0
820362 9 work working storage s 0 0 0 0
9e035e 7 work working storage s 0 0 0 0
800360 8 work working storage s 0 0 0 0
9c035c 6 work working storage s 0 0 0 0
michael@x054:[/home/michael]
True. This is why it is a good idea to use ipcs to complement the output of ps and list all shared memory segments. It will take a while to rummage through my archive, but i used to have a script for that somewhere.... i will post it if i can find it.
Basically you can do a
ps -Alo pid,args,vsz
to list PID (pid) and memory consumption (vsz) of each process ("args"=commandline, for reference) and crosscheck this with a ipcs -Sp .
It'd be nice to have a utility that determines how many pids are on the same page and divide the page up in determining how much to 'charge' a pid. Certainly that makes libc.so virtually free! Or, you could charge it only to the oldest pid that is on it. Then your sort will show heavy hitters, even if they share with friends. I guess you'd have to root around in the open source ipcs and ps a while to find out how to do that.