Fork Function Failed on 4GB ?

filosophizer · February 27, 2013, 10:42am

Hello,

I am running Oracle Database and after a while I keep getting this message whenever I execute any command.

I cannot execute any command even shutdown, whenever I execute any command , I get this message

/usr/bin/ksh: 0403-031 The fork function failed. There is not enough memory available.

bakunin · February 27, 2013, 4:41pm

Your system has absolutely all available memory (including swap) exhausted. Which process is the culprit is everybodies guess. I suppose you have some memory-leak there.

The only option is: if you can, shut down Oracle, which will free some memory and then try to immediately power-cycle the system. If this is not possible your only option is to power-cycle it the hard way. AIX fileystems can usually cope with this and if you do it at a time where DB activity is minimal the risk for the database is small. A bit of redo-log gymnastics is usually all it takes to recover.

I remember having had such a problem once, which was a version incompatibility between some Oracle component and the AIX version (5.2 ML1, IIRC). After updating to a fitting set of versions the problem was gone never to come back.

I hope this helps.

bakunin

DGPickett · February 27, 2013, 5:04pm

/tmp if swap is full. If you keep adding processes or they keep mallocing vm, swao gets exhausted. It can be hard to pinpoint or cure. You can see vm size on 'ps-el'.

MichaelFelt · March 1, 2013, 3:54pm

You could try also - but mail fail due to lack of paging space - to increase the size of hd6 with
chps -s 4 hd6 (to add 4 logical partitions to paging space).

The tuneables to look at - to get warnings ahead of time - are:

michael@x054:[/etc/tunables]vmo -o npswarn -o npskill -o low_ps_handling -o nokilluid
low_ps_handling = 1
nokilluid = 0
npskill = 1024
npswarn = 4096

In particular, the npswarn parameter can be used to take corrective action before you get into the situation that you have less than npskill pages (4k Pages!, so <4MByte by default) of paging space.

michael@x054:[/etc/tunables]vmo -h npswarn                                    
Help for tunable npswarn:
Purpose:
Specifies the number of free paging-space pages at which the operating system begins sending the SIGDANGER signal to processes.
Values:
        Default: 4096
        Range: 1 - 131071
        Type: Dynamic
        Unit: 4KB pages
Tuning:
The default value is the maximum of 512 and (4*npskill). The value of npswarn must be greater than zero and less than the total number of paging space pages on the system. Increase the value if you experience processes being killed because of low paging space.

Note: since your probable memory is running as root (nokilluid == 0) , it does not get killed automatically - even though the system is in npskill level.

What I used to have was a process running in the background that would wait to be woken by SIGDANGER and would activate a predefined, but inactive paging space - and start doing analysis while it was still possible.

In situations like this a command such as

svmon -P -t 10

can give some insight into what program/group of programs are having memory leak issues.

DGPickett · March 1, 2013, 4:30pm

Yes, it takes some swap for any analysis (old hp-ux 11.00 sys):

$ ps -el |(line;sort -nr +9)|pg 
  F S        UID   PID  PPID  C PRI NI             ADDR   SZ            WCHAN TTY       TIME COMD
  1 R          0 25915     1  0 152 20         c0b8ef00 15155                - ?        18:17 mad
401 R          0  2043  2041  0 152 20         a49f4000 13600                - ?        24:59 java
  1 S       6068 10960     1  0 154 20         a6c0ed00 12881           c0e470 ?         1:07 xterm
  1 S       6068 12264     1  1 154 20         a4603f00 10437           c0e470 ?         0:24 xterm
  1 S       6068 11882     1  0 154 20         a733d300 10437           c0e470 ?         0:17 xterm
  1 S          0  1600     1  0 127 20         a2a2a300 5754         b9006f00 ?        20:38 scopeux
401 R      37288  7220  7219  0 152 20         a724d000 5391                - ?        26:51 java
141 R          0  1566     1  0 -16 20         a2a54700 4827                - ?        138:40 midaemon
401 R          0  1274     1  0 152 20         a18f7f00 4362                - ?        27:32 java
  1 S       6068 11490     1  0 154 20         a7f97600 4185           c0e470 ?         0:00 xterm
  1 S       6068 11145     1  0 154 20         a8feba00 4185           c0e470 ?         0:01 xterm

The xterm can have a huge scrool buffer in vm=swap.

MichaelFelt · March 1, 2013, 5:40pm

no, vm or avm is only addressable virtual memory. To really see paging space used you must use svmon . There is a column Pgsp

Further, ps counts everything as if it is unique - but many segments, especially shared memory code and data, as well as kernel.

In sample below see, amng others, that segments b000b and 20002 are everywhere. ps counts, i.e. reports them everytime as if they were only in their process.

michael@x054:[/home/michael]svmon -P -t 3 | grep -v clnt

-------------------------------------------------------------------------------
     Pid Command          Inuse      Pin     Pgsp  Virtual 64-bit Mthrd  16MB
 4391154 java             48179     8478        0    39022      N     Y     N

    Vsid      Esid Type Description              PSize  Inuse   Pin Pgsp Virtual
  830583         3 work working storage              s  19320     0    0   19320
   b000b         d work shared library text          s  10304     0    0   10304
   20002         0 work kernel segment               s   8816  8400    0    8816
  8004e0         - work                              s    279    75    0     279
  870547         f work working storage              s    270     0    0     270
  840564         2 work process private              s     33     3    0      33
  880568         c mmap maps 1 source(s)             s      0     0    -       -
  8e056e         b mmap maps 18 source(s)            s      0     0    -       -

-------------------------------------------------------------------------------
     Pid Command          Inuse      Pin     Pgsp  Virtual 64-bit Mthrd  16MB
 5963962 java             38511     8436        0    31575      N     Y     N

    Vsid      Esid Type Description              PSize  Inuse   Pin Pgsp Virtual
   b000b         d work shared library text          s  10304     0    0   10304
  8906a9         3 work working storage              s  10083     0    0   10083
   20002         0 work kernel segment               s   8816  8400    0    8816
  9807b8         e work shared memory segment        s   1977     0    0    1977
  860686         f work working storage              s    252     0    0     252
  8c068c         - work                              s    112    33    0     112
  8606a6         2 work process private              s     31     3    0      31
  900750         b mmap maps 3 source(s)             s      0     0    -       -
  940774         c mmap maps 1 source(s)             s      0     0    -       -

-------------------------------------------------------------------------------
     Pid Command          Inuse      Pin     Pgsp  Virtual 64-bit Mthrd  16MB
 4456598 cimserver        25141     8434        0    25127      N     Y     N

    Vsid      Esid Type Description              PSize  Inuse   Pin Pgsp Virtual
   b000b         d work shared library text          s  10304     0    0   10304
   20002         0 work kernel segment               s   8816  8400    0    8816
  960356         3 work working storage              s   2896     0    0    2896
  930333         2 work process private              s   2748     3    0    2748
  860366         f work shared library data          s    209     0    0     209
  850385         - work                              s    154    31    0     154
  9a035a         5 work working storage              s      0     0    0       0
  840364         a work working storage              s      0     0    0       0
  980358         4 work working storage              s      0     0    0       0
  820362         9 work working storage              s      0     0    0       0
  9e035e         7 work working storage              s      0     0    0       0
  800360         8 work working storage              s      0     0    0       0
  9c035c         6 work working storage              s      0     0    0       0
michael@x054:[/home/michael]

bakunin · March 1, 2013, 7:41pm

True. This is why it is a good idea to use ipcs to complement the output of ps and list all shared memory segments. It will take a while to rummage through my archive, but i used to have a script for that somewhere.... i will post it if i can find it.

Basically you can do a

ps -Alo pid,args,vsz

to list PID (pid) and memory consumption (vsz) of each process ("args"=commandline, for reference) and crosscheck this with a ipcs -Sp .

I hope this helps.

bakunin

DGPickett · March 5, 2013, 4:20pm

It'd be nice to have a utility that determines how many pids are on the same page and divide the page up in determining how much to 'charge' a pid. Certainly that makes libc.so virtually free! Or, you could charge it only to the oldest pid that is on it. Then your sort will show heavy hitters, even if they share with friends. I guess you'd have to root around in the open source ipcs and ps a while to find out how to do that.