How to restart shell script when cpu is 0.0%?

Hello,
My operating system is ubuntu 16.04. I need to kill or restart a shell script when cpu usage of related process is equal to 0.0% for X seconds. Appearing name on top page is vlc
While surfing on forums, I found below script but not sure how to edit:
Also don't know the function of -gt command.

#!/bin/bash
CPU_LOAD=$(uptime | cut -d"," -f4 | cut -d":" -f2 | cut -d" " -f2 | sed -e "s/\.//g")
CPU_THRESHOLD=700
if [ $CPU_LOAD -gt $CPU_THRESHOLD ] ; then
  kill -9 $(ps -eo pid | sort -k 1 -r | grep -v PID | head -n 1)
fi
exit 0

When I run top :

PID     USER    PR   NI      VIRT     RES   SHR      S   %CPU   %MEM   TIME   COMMAND
4018   zoltan   20    0      631692   33384  12564  S   0.0      1.6   0:05.30     vlc
4019   zoltan   20    0      631692   32500  12304  S   0.0      1.7   0:04.14     vlc

I'd appreciate your help

Thank you
Boris

What makes you think that killing off (especially with kill -9 which is a very bad idea) a random process with a process ID is going to kill a shell script whose CPU is usage 0.0% for X seconds (whatever X happens to be)? Nothing in you script makes any checks to verify any of those assumptions. If you try doing this as root, this would be a great way to make the system unstable and leave lots of artifacts of killed off processes laying around for some poor administrator to try to clean up later and wondering why his or her usually stable system started randomly crashing. PLEASE DO NOT EVER CONSIDER RUNNING THIS SCRIPT ON ANY UNIX, Linux, or BSD SYSTEM!

To answer your question, your script does not call a -gt command; it calls a [ command whose 2nd operand is a -gt (AKA greater than) operator. The man [ command should tell you all about it; if not, try man test .

2 Likes

Don Cragun is ABSOLUTELY right - your script is inconsistent, pointless, illogic, and DANGEROUS.

Please use a different approach: Describe / explain the underlying problem you encounter, present the basic facts of your system, and your thoughts. Then a solution taylored to your request might be found.

A guess: You seem to want to terminate a process that doesn't do any work any more but doesn't quit. None of your commands in your code addresses that. How about identifying the target processes' PID, run ps for those, and check e.g. the"time" format specifier. man ps :

not increasing any more. Another approach might be to go for the process' state or wchan fields, if e.g. a link is broken.

1 Like

Thank You Don and Rudic,
So far, here is what I did:
Program name is cvlc as I am running under command line.

watch -n5 "ps aux -y | grep 'zoltan' | grep 'cvlc' | awk '{print \$4}'"

At the moment, there are two processes and the code gives:

0.2
0.2
0.0

Normally 0.0 at the bottom line is the program itself. I am working on how to take/read all output lines excluding the bottom one or assigning pid value to search in the result.

Yes, that's what I wish to do but you both say it's no good.

PS: At first, I could not have found any info regarding abbreviations, then coincidentally learnt the meaning of gt and lt :slight_smile:
I am working on it..

Update:

ps aux | grep 'zoltan' | grep 'cvlc' | awk '{print $4}'
                if [ $? -eq 0.0 ]; then
echo "vlc is not running"
./restart.sh
else
echo "vlc is okay"
fi
exit 0

Output:

0.2
0.2
./check.sh: line 2: [: 0.0: integer expression expected
vlc is okay

Latest:

                ps aux | grep 'zoltan' | grep 'cvlc' | awk '{print $4}'
                if [ $? -eq 0 ]; then
                id=$(ps aux | grep 'zoltan' | grep 'cvlc' | awk '{print $2}')
echo "vlc is not running"
kill $id
./restart.sh
else
echo "vlc is okay"
fi
exit 0

Not-working process has been killed and restarted..

Sorted now:

                ps aux | grep 'zoltan' | grep 'cvlc' | awk '{print $4}'
                if [ $? -eq 0 ]; then
                id=$(ps aux | grep 'zoltan' | grep 'cvlc' | awk '{print $2}')
echo "vlc is not running"
ps -f -u zoltan | awk '{ for(i=8; i<=NF; ++i) printf $i""FS; print "" }' > rerun
kill $id
sleep 2
sed -i '1d' rerun
chmod 755 rerun
sed -i "s|/usr/bin/vlc|sudo -u zoltan /usr/bin/vlc|g" rerun
sed -i "s|$| >> output.log 2>&1 < /dev/null \&|g" rerun
./rerun
else
echo "vlc is okay"
fi
exit 0

When cpu 0.0%, it restarts related process.. I am sure you can make it with shorter commands but works though.

Boris

I don't think you identified a "non-working" process. With above, you'd kill even the busiest vlc process ever on your system.
With the trailing awk , any result will be turned into "perfect". Like

$ ps aux | grep '[z]oltan' | grep 'cvlc'; echo $?
1
$  ps aux | grep '[z]oltan' | grep 'cvlc'| awk '{print $4}'; echo $?
0

Why not just use pkill vlc , then?

Hello Rudic,
You are right, works for just one process.
I think the problem happens as 0.0 not equal to 0.
I have multiple vlc processes, for that reason I should tell script to kill pid where cpu is 0.0%

Update: For your valuable comments:

set -x
ps -uax | grep 'zoltan' | grep '/usr/bin/vlc' | awk '{print $2,$3,$11,$12,$13,......}' > report2 #grep_all_required_columns_save_into_file
awk '$2==0.0' report2 > report
rm rerun
cat report | awk '{ for(i=3; i<=NF; ++i) printf $i""FS; print "" }' > rerun
while read COL1 COL2
do
kill -9 $COL1
done< report
sleep 2
#sed -i '1d' rerun
chmod 755 rerun
sed -i "s|/usr/bin/vlc|sudo -u zoltan /usr/bin/vlc|g" rerun
sed -i "s|$| >> output.log 2>&1 < /dev/null \&|g" rerun
./rerun
sleep 2
exit 0

Thanks
Boris

You are right. Most shells (with the notable exception of ksh93) only have integers as a data type but not floats. Therefore "0.0" is interpreted as a string but -gt (and similar operands like -ge , -eq , -lt , -le ) expects integers and nothing else. i.e test 0 -eq abc will lead to the same error.

The first thing i see is the missing shebang: do yourself a favour ad explicitly state the shell you want to run this script. Otherwise you will eventually get disappointed.

Second, the ps command seems not to do what you probably intend it to do. ps -uax displays all processes run by a user named "ax". Notice that there is a difference between BSD-style ps and UNIX-SystemV (or POSIX-) style ps . Most implementations understand both sorts of syntax and because in BSD the ps -options were not introduced by dashes if you use dashes the options are interpreted in POSIX-syntax and if you don't they are interpreted in BSD-style. i.e:

ps aux     # BSD-style with options "a" (all processes), "u" (user-oriented format) and "x" (also processes without a tty)
ps -aux    # POSIX-style with options "-a" (all processes except session leaders and processes without a tty) and
           # -u <username>, since "-u" expects a user name "x" is interpreted as this

Note that some versions of ps , when no user named "x" exists, second-guess the intention and execute as they had been called like ps aux instead. Needless to say that one better not relies on such if i say a you will do as if i had said b because in reality i might mean c ... tactics.

With the correct syntax, though, you could also get rid of the

| grep zoltan

i.e by using -U <user> .

I hope this helps.

bakunin

1 Like

Thank You Bakunin for your detailed explanation,

  • I have put shebang to script for crontab tasks.
  • During surfing on forums, I realized that there were many different ps variations giving different outputs.

Kind regards
Boris