Hello,
How can we clear the D state (orphaned) process? I have tried to kill it with kill -9 but not work.
The server is critical, so is there anyway to clear the D process without rebooting the server?
Hello,
How can we clear the D state (orphaned) process? I have tried to kill it with kill -9 but not work.
The server is critical, so is there anyway to clear the D process without rebooting the server?
You can check to see what is the parent process, and if possible you can kill or restart the parent process (as long as the parent process is not the root process).
In the case of remote mounts causing the D state, you can check the parent networking process and decide how to proceed.
Some people have tried to be creative as follows:
gdb
and attach to the parent in this example , attach 3200
waitpid
for the zombie process:, for example call waitpid(3100,0,0)
Update: Fixed typos (I think!)
D state is "device waiting" and is a bit nasty.
Such a process cannot be killed.
It makes sense to guess the blocking device, and fix it. Once fixed, the proceses will leave the D state and continue.
Here are the different process state codes and description:-
D Uninterruptible sleep (usually IO)
R Running or runnable (on run queue)
S Interruptible sleep (waiting for an event to complete)
T Stopped, either by a job control signal or because it is being traced.
W paging (not valid since the 2.6.xx kernel)
X dead (should never be seen)
Z Defunct ("zombie") process, terminated but not reaped by its parent.
As you can see, D means uninterruptible sleep usually due to an IO.
You can check the wchan - name of the kernel function in which the process is sleeping to understand what exactly is going on:-
ps -eo pid,ppid,state,wchan=WIDE-WCHAN-COLUMN,comm,args | ( read -r; printf " %s\n" "$REPLY"; grep <your process name/pid> )
Usually it will be a exit_mm() function to release all memory descriptors and related data structures.
As per linux kernel documentation, it first of all checks mm->core_waiters flag is set. If it does, then the process is dumping the contents of memory to a core file (IO). If that is the case, I believe to avoid corruption, it will not respond to a KILL signal until the core file dumping is completed.
Hi Neo,
It's orphan process, not zombie, and its PPID is 1
[root@xxx:~]# ps -ef | grep dsmc
root 13613 1 0 Apr19 ? 00:00:00 dsmc q systeminfo policy -console
root 17067 12166 0 14:33 pts/2 00:00:00 grep dsmc
root 21870 1 0 Apr22 ? 00:00:00 dsmc
Hi MadeinGermany
You mean guessing the IO devices (disks) ? The root cause of this is that the NFS server was disconnected unexpectedly and caused the NFS mounted folder became unresponsive, I have forced unmount and remount when the NFS server is back. And cannot kill it.
Hi Yoda,
I have tried your command
ps -eo pid,ppid,state,wchan=WIDE-WCHAN-COLUMN,comm,args | ( read -r; printf " %s\n" "$REPLY"; grep <your process name/pid> )
And resulted in as below:
[root@xxx:~]# ps -eo pid,ppid,state,wchan=WIDE-WCHAN-COLUMN,comm,args | ( read -r; printf " %s\n" "$REPLY"; grep 13613 )
PID PPID S WIDE-WCHAN-COLUMN COMMAND COMMAND
13613 1 D cifs_reconnect_tc dsmc dsmc q systeminfo policy -console
[root@xxx:~]# ps -eo pid,ppid,state,wchan=WIDE-WCHAN-COLUMN,comm,args | ( read -r; printf " %s\n" "$REPLY"; grep 21870 )
PID PPID S WIDE-WCHAN-COLUMN COMMAND COMMAND
21870 1 D cifs_reconnect_tc dsmc dsmc
Look like it matches with my finding above (nfs disconnected). Now the nfs mounted folders are back. As the state D, so we cannot kill it, a reboot only can help clearing it?
Yes, I understand D
state is for orphans and Z
is for zombie.
However, the process of using gdb
to attach to the process is the same.
The "creative process" I suggested using gdb
can be tried before rebooting if you absolutely do not want to reboot.
Don't you agree?
Actually I expect gdb to also get hung when attaching it to a process in D state. But it's worth a trial.
If processes are permanently hung in cifs_reconnect_tcon then it looks like a kernel bug (or a missing interrupt/timeout feature).
Is your kernel at the latest patch level?
I agree.
He has nothing to lose to at least try to attach with gdb
if he really does not want to reboot.
He might get lucky
I tried but no lucky :rolleyes:
[root@xxx:~]# ps -ef | grep dsmc
root 10765 1 0 Apr19 ? 00:00:00 dsmc q systeminfo policy -console
root 14196 1 0 Apr23 ? 00:00:03 /usr/bin/dsmc schedule -optfile=/opt/tivoli/tsm/client/ba/bin/dsm.opt
root 27110 2182 0 18:41 pts/0 00:00:00 grep dsmc
[root@xxx:~]# gdb
GNU gdb (GDB) Red Hat Enterprise Linux (7.2-83.el6)
Copyright (C) 2010 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-redhat-linux-gnu".
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
(gdb) attach 10765
Attaching to process 10765
ptrace: Operation not permitted.
(gdb)
Thanks for trying....
It was a long shot, but sometimes we do get lucky