Stateless process

Hi Folks

I'm trying to monitor that a process is running, using ps.

Astonishingly the process, which is checked every 15 minutes, is runnnig but without a state about 2-3 times a day.

Extract from the script :

#!/bin/ksh

# edii_pid is PID of process to monitor.

# Checking if pid is listed in hosts processes.
# tr used for getting rid of potential leading blanks.
edii_pid_listed=`ps -p ${edii_pid} -o pid | grep -v "^  PID" | tr -d ' '`

if [[ ${edii_pid_listed} != ${edii_pid} ]]; then
  # So PID is listed, exists. In fact the process/application is runnig
  # for several days continuously.

  # Checking if pid is in running (=0) state.
  # tr used for getting rid of potential leading blanks.
  edii_pid_runnning=`ps -p ${edii_pid} -o s | grep -v "^S" | tr -d ' '`
  if [[ ${edii_pid_runnning} != 'O' ]]; then
    print "${edii_pid} isn't in running (O) state but ${edii_pid_runnning}!" 
  fi
fi

And for about 2-3 times a day i get this :

PID 2939 isn't in running (O) state but !

Any hints why a process that is running for days, has been "seen" in list of processes only microseconds before querying the state is stateless?

System is SunOS 5.10 Generic_148888-03 sun4v sparc SUNW,SPARC-Enterprise-T5220

Cheers

Michael

I am confused by what you say you see. Processes either exist in the kernel process masthead or they do not exist. A non-running (terminated) process that has not been waited for is a zombie - the other possible states are states like sleep, swapped out, some kind of wait state, running, etc. There is no "stateless" value presented by ps.

I also do not get what your code is supposed to be doing.

If you want to check process existence try something like this:

# the 0 is a zero and does not harm the process, you have to be root or same user
# to signal any process
   kill 0 $pid && echo "$pid is alive" || echo "$pid is not alive"
1 Like

Suggest you modify your script to better capture the raw process state. Something like the following (which I have not tested):

edii_pid_runnning=`ps -p ${edii_pid} -o s`
edii_pid_runnning_state=`echo ${edii_pid_runnning} | grep -v "^S" | tr -d ' '`
if [[ ${edii_pid_runnning_state} != 'O' ]]; then
    print "${edii_pid} isn't in running (O) state but is ${edii_pid_runnning}!" 
fi
1 Like

I guess the state becomes S (sleeping) that you are discarding with grep.
Improvement:

edii_pid_runnning=`ps -p ${edii_pid} -o s=`

And

edii_pid_listed=`ps -p ${edii_pid} -o pid=`

The = omits the ps header.
Leading spaces are removed by the shell if you use the variable unquoted in commands like

if [[ ${edii_pid_listed} != ${edii_pid} ]]; then
1 Like

Yep, that was it.

Added some more debug lines to the script and noticed that the process (S)leeps from time to time. Which is perfectly normal for unix.

So, puh, the world still _is_ a sphere after all. Just the usual PEBKAC. :sunglasses:

Thanks all for replying!

A further improvement:
In Solaris and Linux you can simply test for a known PID like this

if [[ -d /proc/${edii_pid} ]]; then
 echo "still alive"
else
 echo "dead"
fi
1 Like

and on systems that don't, I believe you can just do

ps $pid >/dev/null 2>/dev/null || echo "$pid is dead"
1 Like