zombie process

Is there an equivilant to the preap command in AIX that would allow me to get rid of a zombie process. I am new to AIX, moving over from Solaris and in the past I have been able to preap the pid on the defunct process to clean them up. I have looked around and the best I can see is that it may take a reboot to clean them up, so they may just need to sit there. Thanks in advance.

You should be able to waitpid() for the process (that is not your own (child)) but I cannot verify this (without writing my own).

Another lame (probably not what you are looking for) response, but you should talk to the application vendor / developer and have them wait() from the process that launches the children. In short, it is an application problem.

---------- Post updated at 10:52 AM ---------- Previous update was at 09:54 AM ----------

I tried... I wrote an AIX preap app that called waitpid() on another processes children*. It bounced immediately (if using WNOHANG or not) (with a -1 return value) because the current process had "No child processes".

[*Source is available upon request - if you are interested.]

So... (as far as I can tell) AIX will not let you wait() for another processes (zombie) children. The correct response is to have the parent app call wait() - like it should. If that is not possible then bouncing the app (process group) should work, instead of the whole box.

I am not sure how Solaris does it... My version Solaris Internals does not yield a quick answer.

Many thanks for your contribution. If you could provide a (maybe stripped down to the point) source here we would all gain from your insight. The board is guaranteed to be able to cope with it.

bakunin

I wrote two apps to test this... preap and badkids. badkids spawns three zombies and stays alive so they are not reaped by the shell or init. (Technically I think init does the reaping.) badkids should print the PID of the zombie children but does not - so you have to do a ps to get the PIDs. (This would be trivial as it does print its own PID on startup.)

preap looks for a PID parameter on start and passes that to waitpid(). waitpid() currently uses no options but was originally written with WNOHANG. (I think I spelled that right - the docs are in the wait() InfoCenter page.) The status value (passed by reference to waitpid()) is filled with the exit info of the child process. The exit info is encoded to hold more than one piece of info, so I call the WEXITSTATUS() macro to find just the exit value.

But, in short, a program must call a wait() variant *every* time a fork() call is made. This does not mean that it must be called immediately, but it must be called or those process zombies will result. Unix provides multiple different versions of wait() - blocking, non-blocking, wait-for-your-gid, etc... that can be used to reap the children however is best.

As for the code... I do not intend to keep it up forever so I am presenting a "fragmented" URL that will not be indexed. (You have to put the pieces together to grab it.) It will be up for a month or so. (today's date is 4/7/10)

http://www.tablespace.net
/samples/preap-0_1_0.tar.gz

Note on building: will use the AIX supplied make or g(NU)make, but is written for gcc. I have not tried, but see no reason it would not compile on Linux, OS-X, Solaris, etc... (Just tried it on OS-X, it compiled fine.)

This code is free to distribute, re-use, re-post, or line the floor of your home directory. Although, I would make it more robust before re-use - this is just sample, as opposed to production code.