Finding Out When A Process Has Finished?

Problem

I have an application which basically runs lots of UNIX programs remotely, using the Telnet protocol. For each program it remotely executes, it stores the process ID (PID) for that process.

At regular intervals, I would like my application to take the PID for every process still running and find out if it is still running, or if it has terminated.

Solution

What is the most elegant, or the cleanest way of finding out whether a process is still running? Bear in mind that it is a program doing the checking, not a human!

My idea was simply to execute a "ps -p <PID>" command for every PID and if there are no entries, the process has terminated, otherwise it must still be running.

What do you think? Thankyou very much.

Yes , with ps -ef you can check the proccess fine. Also consider 'top'. It is an application that will tell you alot about applications running.

ps and top require a nontrivial amount of resources to run. And then you still must parse the output. The cheapest way to see if a process is running is via the kill() system call. If you call kill with signal set to zero, error checking is performed but no signal is actually sent.

This works from the shell as well. You can do "kill -0 $pid" and then check the return code. If it worked, the process is still alive.

Thanks for your tips! I have one more query concerning PIDs:

On a typical UNIX machine (whatever that is!), how often is a specific PID likely to be allocated to a process? Of course, this will depend on the number of processes running, but I was just curious to see if there is a chance that during the time between my checks to see if processes are running, one of my processes terminates and it's PID is actually applied to a new process, thus giving my application the impression that the terminated process is still running, when in fact it has finished & the PID being searched for actually belongs to a completely different process!

My application is currently checking all PIDs at 5-minute intervals, so in the worst case scenario, the same PID would have to be allocated within a time-frame of 4:59 minutes. This seems unlikely.

Pid's in unix go up to 32,000 then they recycle. In a multi-processor environment it is common for each cpu to grab 5 or 10 pid's at a time to reduce spinlocking so you can't expect them to be allocated in order.

I hope that you are not running this as root! Assuming that you are running as an ordinary user there is no problem. If another user allocates the same pid, your process will not succeed in invoking a kill system call against it. You can only kill your own processes unless you're root.

No problems - I am connecting to and running all commands as a standard UNIX user.

I have one final query concerning PIDs and use of the ksh ! variable :

90% of the time, when I parse the output of a "echo $!" command, I correctly obtain the PID number. However, sometimes the value of the ! variable is echoed back as:

"[1] 12766"

where 12766 is the PID.

Does anyone know why i get this strange prefix? I will have to change the way I parse the echo of the ! variable now, because I only want the PID, not the bracketed number 1.

many thanks.