Ps ax with grep in loop

Hello,

I have built the following script to check if processes supplied by the argument are running or not.

#!/bin/bash

PROCLIST=$1

PROCESS="0"
ERROR_PROCS=""
IFS='+'
read -ra ADDR <<< "$PROCLIST"
for PROC in "${ADDR[@]}"; do
                if [ `ps ax | grep $PROC | grep -v grep | wc -l` -lt 1 ]; then
                PROCESS=1
                ERROR_PROCS="$ERROR_PROCS""$PROC ";
                fi
        done

        if [ $PROCESS -eq 1 ]; then
                echo "CRITICAL - One or more processes ($ERROR_PROCS) not running"
                exit 2
        fi


echo "OK - All monitored processes are running. Process: $PROCLIST"
exit 0

it seems it works fine apart from the fact the "ps ax | grep "process" | wc -l" gives a higher count than expected.
For example, if we take the process named "test" (which doesn't exist), it returns a count of 2.

[root@matt Linux]# ps ax | grep test | grep -v grep
[root@matt Linux]#

if We run the script with verbose:

[root@matt Linux]# bash -x ./test2.sh test
+ PROCLIST=test
+ PROCESS=0
+ ERROR_PROCS=
+ IFS=+
+ read -ra ADDR
+ for PROC in '"${ADDR[@]}"'
++ ps ax
++ grep test
++ wc -l
++ grep -v grep
+ '[' 2 -lt 1 ']'
+ '[' 0 -eq 1 ']'
+ echo 'OK - All monitored processes are running. Process: test'
OK - All monitored processes are running. Process: test
+ exit 0

What could be the reason that it captures the value of 2? This also happens if the process does exist. in that case, it returns a 3.

Rgds,

Matthew

Did you consider "false positives"? Processes with the search string as part of the command (fittest, hottest, testcase)? E.g. grep man would show mman , manager on my system.

Any "test" user on the system?

Hi RudiC,

In this case, the test was just used as an example. However, if we still take it for our case then nothing exists as a false positive:

[root@matt Linux]# ps ax | grep test | grep -v grep
[root@matt Linux]# 

in that case that would have returned a value, which means there is nothing related to "test" in our list.

Also if you take an existing process, it will count but adds 2.

Here's another example:

process: master

[root@matt Linux]# ps ax | grep master | grep -v grep
 1705 ?        Ss     5:42 /usr/libexec/postfix/master -w

So one process exists

if we run the script I have returned a value of 3, 1 for the process 2 not sure why the bash script is originating this number.
I could eventually subtract the value but doesn't make sense.

[root@am1-stp-oam01 Linux]# bash -x ./test2.sh master
+ PROCLIST=master
+ PROCESS=0
+ ERROR_PROCS=
+ IFS=+
+ read -ra ADDR
+ for PROC in '"${ADDR[@]}"'
++ ps ax
++ grep -v grep
++ wc -l
++ grep master
+ '[' 3 -lt 1 ']'
+ '[' 0 -eq 1 ']'
+ echo 'OK - All monitored processes are running. Process: master'
OK - All monitored processes are running. Process: master
+ exit 0

Strange. I vaguely remember we had a similar problem quite some time ago, but can't find the solution.
For debugging, in the script, echo the variables, and run the ps ax | ... pipe on its own to see its result.

Why, BTW, is above that complicated?

ps ax -ocomm= | grep -E "${1//+/|}" | sort | comm -13 - <(echo "${1//+/$'\n'}" | sort)

will serve you the non-running processes (of the plus-sign separated list in $1) on a silver plate...

1 Like

Excellent, that really simplifies the job. I have used the one-liner for my benefit as follows:

#!/bin/bash

PROCLIST=$1
tmp=`ps ax -ocomm= | grep -E "${PROCLIST//+/|}" | sort | comm -13 - <(echo "${PROCLIST//+/$'\n'}" | sort) | wc -l`
tmp2=`ps ax -ocomm= | grep -E "${PROCLIST//+/|}" | sort | comm -13 - <(echo "${PROCLIST//+/$'\n'}" | sort) | tr "\n" " "`

if [ $tmp -gt 0 ]; then
        echo "CRITICAL - One or more processes (`echo $tmp2`) not running"
        else
        echo "OK - All monitored processes are running. Process: `echo $PROCLIST | sed 's/+/,/g'`"
        fi

tmp2 is just to produce them in a one-liner

Thanks again for your feedback

Yes,

ps ax -ocomm=

does not list users or command arguments, so cannot unwantedly show false positives in them.
Even simpler than

ps ax -ocomm= | grep "$PROC"

is

pgrep "$PROC"

Both provide an exit status, so you don't need to count them (with -c option) and compare them.

if ps ax -ocomm= | grep -q "$PROC"; then echo running; fi
if pgrep "$PROC" >/dev/null; then echo running; fi

--
Last but not least, you can set IFS='+' temporarily for the read command:

IFS='+' read -ra ADDR <<< "$PROCLIST"
# continue with the original IFS
1 Like

Why that complicated?

TMP=$(ps ax -ocomm= | grep -E "${1//+/|}" | sort | comm -13 - <(echo "${1//+/$'\n'}" | sort))
[ $TMP ] && echo "CRITICAL - One or more processes ($TMP) not running" || echo "OK - All monitored processes are running. Process: ${1//+/,}"

The original command that you've shown us that you're using to see if master was running is:

[root@am1-stp-oam01 Linux]# bash -x ./test2.sh master

So, if master is running, you get a count of 1 for the process you're looking for, an additional 1 because bash is running ./test2.sh master , and a third 1 because you're running bash -x ./test2.sh master .

Sometimes it is easier to debug things like this by changing:

[ `ps ax | grep $PROC | grep -v grep | wc -l` -lt 1 ]

to:

[ `ps ax | tee step1 | grep $PROC | tee step2 | grep -v grep | tee step3 | wc -l` -lt 1 ]

and examine the contents of the files step1 , step2 , and step3 to see what processes were matched that you hadn't expected.

As RudiC suggested, using ps -ax -ocomm gets rid of the problem here. But adding tee s in a pipeline frequently helps when shortcuts like -o comm don't apply.

3 Likes

The same idea but implemented with two grep's

#!/bin/bash
plist=${1//+/$'\n'}
TMP=$(fgrep -vxf <(ps -eo comm= | fgrep -x "$plist") <<< "$plist")
if [ -n "$TMP" ]
then
  echo "processes not running:"
  echo "$TMP"
else
  echo "Ok"
fi

Note that [ $TMP ] is not robust in case $TMP contains shell-special characters or test-operators like -n or =
So should be quoted and prefixed with a -n operator.
[[ $TMP ]] might be safe as well.

if pgrep isn't what you want, could you make an expression from the item you are searching for? I avoid using a contruct like ps -ef | grep this | grep -v grep byt writing it as ps -ef | grep -E "thi" so the expression does not match its own process. If you are passing it a loop of items to check, it could get a bit fiddly, but with variable substitution you could achieve it, perhaps like this:-

set -x

for PROC in "${ADDR[@]}"; do
do
   PROC_a="${PROC%?}"                           # Chop off last character
   PROC_b="${PROC#$PROC_a}"                     # Work out the last character
   PROC_E="${PROC_a}[${PROC_b}]"                # Assemble expression
   if ! $(ps ax | grep -Eq "$PROC_E")           # Test for a non-zero return code when looking for processes
      PROCESS=1  
      ERROR_PROCS="${ERROR_PROCS} ${PROC}"              
   fi           
done

echo "Failed to find ${ERROR_PROCS}"
set +x

You still might have to be careful because there is a risk that there are false positives, e.g. someone stops a service called MAINPROC (so there are no process like that running, but then edits the file /var/log/MAINPROC, and the editor command shows up as a process matching your search and therefore you think it is still running.

Can you tell us more about the processes you are looking for and therefore might be a better way to be checking for them. Perhaps if they write their process-id in a file in /var/run/name then you can read that file and make sure the process is what it should be.

It depends how far you want to push this. You processes might respond to a signal to say that they are running okay, for instance and you could actually give them a nudge to make sure that they are happy and not stuck in a loop, for instance or they could frequently be re-writing a file with the current date (best as date +%s format) and if it is out of date by too long (you decide what is too long and compare to current date +%s value) then raise an alert.

There are many ways to do it.

Robin