defunct processes?

HiI had a tool fail recently, on analysis I found it was cleaning up orphaned directories that had been created by specific processes that had died for some reason, thus failing to clean up after themselves.The directories were of the form /dir.pid. The tool would look to see if any instances of the process were running under that pid and if not would clear away the directory. It was failing because intermmittently it was seeing a instance of the pid in the ps output.I put a trap in for this (grep -vi) and all seemed well but I have now seen it fail once more, unfortunately with no trace on. I cannot replicate it as it is now so intermittent with the fix I mentioned in place.My question is "Are there any other ways a dead process can show up in the ps output and if so what should I be grepping for"?CheersPS Sorry if the format of this post is rough, my work PC is locked down and doesn't seem able to handle the java very well.

dead process == zombie? What is your definition of a dead process?

The OS cycles the pid of new processes from 1 -> signed short max for most systems.

You should be checking the user/owner of the file and looking for the both the pid and the user. Not just a pid. pids are reused constantly. lsof will tell you if a process still has the file open, if that is the source of your problem.

What OS do you have? Does your system have mandatory locking -- i.e., Apps like weblogic lock directories, do you have something like that running?

1 Like

HI Jim, thanks for the reply. Once again aplologies for the text all being in one paragraph. No the script parses a .pid extension off a temporary directory and if it doesn't find a running process with that pid it clears down the directory. The problem I found when tracing the fail was a process with that id was still listed but with the arg DEFUNCT. You have got me thinking though, perhaps the transient error I found was due to the process id being reassigned to something else. That would explain why I am struggling to replicate it. I need to look at the code again and see if I can tighten it up so it finds the process name as well as the pid. One issue I am finding with these scripts is the ps args are often truncated to fit on screen and so the process name is not recoverable from the ps output. I think that is why they may have been written that way to start with. Do you know of anyway of forcing ps to output the whole line, wrapping if necessary? It would have to work on SUN and AIX..... Once again, thanks Jim, as always your very helpful