removing hang processes (AIX)

Hi Guys,

Just wondering if I have a child process which is basically hanging and I can see that is on sleep or wait mode if I want to remove/terminate this process , the signal has to come from its parent I assume. The reason I'm asking this is because I'm facing a db2 hang situation and we can see that most processes are showing:

Waiting on latch type: (SQLO_LT_SQLE_DBCB__dblatch) - Address: (78000000029ef20), Line: 351, File: sqlmutil.C

I have raised this with IBM but no luck and to the hangs doesn't ends here because we can't remove those processes issueing :
-db2stop
-db2stop force
-db2_kill
-kill ppid
-ipcrm (remove s,m,q) AIX
-kill -9 -1
-kill using root access
therefore the only way to restore the sevices is rebooting the server and this is a big shared server. I would like to know how can I remove a process in this sort of state. Can I send I signal to exit .Please any ideas or suggestions would be highly appreciated.

Harby.

If you can't kill a process even with -9 or you have IPCs listed with ipcs, that you can't remove with ipcrm, then there is nothing left than a reboot.

Maybe check if your AIX and DB2 are at reasonable ML/fixpacks. What did IBM say? "Sorry, we are just selling this software." :wink: :slight_smile:

When you stop or kill the parent process, does it go away?

It looks as though the proc was in a non-signalable state.
This often happens when e.g. you have a lingering stale NFS mount because the server disappeared, or when a disk drive passed away and needs replacement or because a filesystem became corrupt.
Usually the hanging procs terminate as soon as the resource they have been waiting for becomes available again.
Of course, as someone mentioned, you can always reboot to flush your proc table. But I guess this isn't really an option?