Help with Shell script that monitors CPU Usage

I'm a newbie to shell scripting, I was given this script to modify. This script that monitors when CPU Usage is too high based off the top command. The comparison is not working as it should. Its comparing a decimal to a regualar interger. When it send me an email, it send an email and ignores the if statement. I receive an email if its below or above 90. I only need an email if its greater than 90. I use

 while :; do :; done 

to run CPU up to 100%. and then test running the below script. Can you all please assist me in what I'm doing incorrectly here:

#!/bin/ksh
top -b -n 1 |head -8 >/tmp/cpu.text
sed -e '1,5d' /tmp/cpu.text >/tmp/cpu2.text
High_CPU=`cat /tmp/cpu2.text|tail -1|awk '{print $9}'`
host=`hostname`
if [ "$High_CPU -gt  90" ];
then
mail -s "High CPU USAGE on $host" test@test.comv < /tmp/cpu2.lst
fi
exit;


Remove the quotation marks in the if statement

When I remove the quotations, this is what I receive:

cputest.sh: line 6: [: 30.0: integer expression expected

without rewriting the whole thing to be more readable/efficient......

#!/bin/ksh
top -b -n 1 |head -8 >/tmp/cpu.text
sed -e '1,5d' /tmp/cpu.text >/tmp/cpu2.text
High_CPU=$(cat /tmp/cpu2.text|tail -1|awk '{print $9}' )
host=$(hostname)
if [ "$High_CPU" -gt  90 ]
then
mail -s "High CPU USAGE on $host" test@test.comv < /tmp/cpu2.lst
fi
exit
1 Like

Hey @vgersh99, I copied and pasted your script to make sure I wasnt missing anything and I receive the same error:

cputest.sh: line 6: [: 70.0: integer expression expected

most likely your High_CPU variable doesn't contain integer value.
Put the script in set -x and debug it from there.

High_CPU is always a decimal from the top command. Does that help?

--- Post updated at 03:11 PM ---

This is the output from set -x

[dbtlst@asdv-snsc-tlst ~]$ sh cputest.sh
+ top -b -n 1
+ head -8
+ sed -e 1,5d /tmp/cpu.text
++ cat /tmp/cpu2.text
++ tail -1
++ awk '{print $9}'
+ High_CPU=57.9
++ hostname
+ host=asdv-snsc-tlst.dsns.cdc.gov
+ '[' 57.9 -gt 90 ']'
cputest.sh: line 7: [: 57.9: integer expression expected
+ exit

ksh doesn't have floating arithmetic, but you can overcome this multiple ways.
Either converting to int, doing math in say awk or using bc for comparison
and many other ways....

1 Like
if [ "${High_CPU%,*}" -gt  90 ]

or by declaring High_CPU as an int :

typeset -i High_CPU

And doing the comparison the old way

1 Like

This is what I received when I did

if [ "${High_CPU%,*}" -gt  90 ]
[dbtlst@asdv-snsc-tlst ~]$ sh cputest.sh
+ top -b -n 1
+ head -8
+ sed -e 1,5d /tmp/cpu.lst
++ tail -1
++ cat /tmp/cpu2.lst
++ awk '{print $9}'
+ High_CPU=21.1
++ hostname
+ host=fakehostname.com
+ '[' 21.1 -gt 90 ']'
cputest.sh: line 8: [: 21.1: integer expression expected

and this is what I received when I did typeset -i High_CPU

+ top -b -n 1
+ head -8
+ sed -e 1,5d /tmp/cpu.lst
+ typeset -i High_CPU
++ cat /tmp/cpu2.lst
++ tail -1
++ awk '{print $9}'
+ High_CPU=16.7
cputest.sh: line 6: 16.7: syntax error: invalid arithmetic operator (error token is ".7")
++ hostname
+ host=fakehostname.com
+ '[' '' -gt 90 ']'
cputest.sh: line 8: [: : integer expression expected
+ exit

What OS are you on?
Instead of running the script as sh cputest.sh , make the script executable ( chmod +x cputest.sh ) and rerun it with ./cputest.sh .
Share the outcome.

Also... "${High_CPU%,*}" where High_CPU has a value of 21.1 - where , is not .

Redhat Linux 7, and IT WORKS, that was the issue, why did that cause an error?

--- Post updated at 03:43 PM ---

Thanks so much, I been trying to figure this out for a week! I'm so grateful to you all.

Both approaches worked?
If you run the script with sh cputest.sh , the content of the script is pasted to whatever shell interpreter you run it with on cli - in your cash it's sh (which might or might not be the same as /bin/ksh that you have on the first line of your script).
If you make the script executable and run it with ./cputest.sh , the specific interpreter to be used is taken from the script itself and it's /bin/ksh .
The bottom line is: your sh might or might not be the same as /bin/ksh

Ok great. I will make sure to do ./cputest.sh in the crontab. Also do you mind providing resources that could help me get better with shell scripting. I want to get better because I feel like I suck right now.

crontab has no notion of relative directory spec as you used in ./cputest.sh .
Use either fully qualified path to a script OR modify your PATH variable include the directory where your script reside.

For the enablement books, everyone has his/her preferences - I'd start by browsing through the FAQ section at these forums.

let High_CPU=$High_CPU * 10
If [ $High_CPU -gt 900 ]

As long as the let statement will do decimal arithmetic.

It is easier

if [ ${High_CPU//[^0-9]} -gt 900 ]

For one solves the problem of separator. I have a comma separator for float numbers in "top"
x86_64-redhat-linux-gnu
Yes, in pdksh this substitution does not work.

I am sorry but already this premise is wrong - which is probably not your fault. "Monitoring CPU" has nothing to do at all with looking at how many % of the CPU respourcce is busy at a single moment. It is worthwhile to monitor the CPU usage, mind you, but not in this way. So, here is a (very short and very incomplete) introduction how UNIX (and Linux alike) works:

UNIX is a time-sharing system: that means, when several processes run (seemingly) simultaneous they run in fact one after the other. The first one is given the CPU, runs for a while (microseconds), then it is frozen, the CPU is given the next process, and so on. After a while the first process gets the CPU again and because that happens so fast it seems that all processes run continuously at the same time. You may have noticedthat i spoke about "the CPU" (singular) above. Now modern systems have typically many CPUs (basically a "core" is a CPU, so a 6-core processor is basically 6 processors bonded into one die). In fact UNIX was designed for (with a few restrictions which would lead too far away) any number of processors. If a system has, say, 4 processors and 100 processes to run then 4 processes will run really simultaneous and the "switch" will only happen 25 times on average (the truth is that processes do not always get equal time but for the purpose of this introduction just suppose they do) and everything else works the same.

Also notice that i consequently say "process(es)" and not "programs". A "program" is a file on your disk. When it is loaded into memory, given some necessary resources and is run then this instance is a "process". You could start the same program several times at the same time and it would be one program but different processes (running instances of this program).

Alas, this is not the end of it. There are tasks which can only be done step by step: if i give you a number and tell you to "multiply it by 7, then take the square root of the result, subtract 5 from it, ..." then you could solve that only in the order i presented it to you. Every step needs the result of the last one. But if i give you, say, 50 numbers and ask you to multiply every one by 5 you could hire 50 people, give one number to each of them and the can calculate their number at the same time. In programming there are also tasks that can be "parallelised" and other which can only be done "sequential". Most times a program is a mixture of both types of tasks. For this in programming there is the "thread" model: a thread is a "sort-of" process but without its own environment. Think of it like a shared apartment: each participant has his/her own room but all use the same kitchen and bathroom. So, for things that can be done in parallel a process an create several threads and each one may (given enough resources) use its own CPU thus speeding things up. This is how a process could use more than one CPU and is called "multi-threaded" in opposite to "single-threaded".

After this rather long-winded introduction you probably want to know what that has to do with CPU usage. Well - everything! First, what is "X% CPU usage"? It could be all the processors working at X% capacity on average. It could also be X% of the processors working at 100% and the others doing nothing. Which one it is depends on the nature of the workload: if you have only single-threaded processes then each one will take exactly one processor at a time and chances are some processor does nothing while another works at 100%. If you have only multithreaded processors chances are that all processors are doing something at any point and how much of their capacity is used depends on how many processes and threads are running and/or how much each of them demand.

Systems are built in a way that they meet the demands of the running software but not more - less would hurt the objective of running the software, more would hurt the finances of the company. To determine how much "enough" is is in fact the art of the systems administrator and his/her tuning and monitoring skills. It is natural for a system which is not wildly oversized for its purpose that it sometimes hits the 100% mark, especially with the CPU resource. In itself it only means that every CPU you have assigned to the system serves a purpose. But it doesn't mean there is a shortage on CPU power. At least, just because you hit 90% doesn't mean a shortage in itself. It depends on several other factors if this is indeed a symptom of CPU shortage or not, but simply monitoring average CPU consumption will not tell you at all if this is the case, as i explained above.

My suggestion would be to monitor CPU usage like this: whenever you get a value over some threshhold (say, 90%. IMHO 95% or even 99% would be better suited) you check in intervals of 3 minutes again if it is still the case. If it is for 5 consecutive measurements only then you issue an alarm by e-mail. Everything else will lead to many false alarms as i can tell you because i had once to suffer as the admin of a system "monitored" this way. Every time i was on standby i got 4-5 calls per night - for absolutely no reason at all. Furthermore, a better way to monitor CPU usage is the vmstat command. You may want to read my Little introduction to Performance Tuning about how to interpret its output. There you only have integers from the start so you wouldn't need to convert anything. On the other hand you will not get a "single number" as an answer - which, in fact, is justified. Einstein once said about explanations: "make it as simple as possible but not simpler". The same is true for performance monitoring: instead of a single misleading number you get several values but they will also tell you more about what is going on.

The reason is that the if -statement (in fact, the test -statement it triggers) is not executed at all. You might want to use the f_Round -function from my ksh-library to create an integer from a float:

# ------------------------------------------------------------------------------
# f_Round                                                      rounding numbers
# ------------------------------------------------------------------------------
# Author.....: Wolf Machowitsch
# last update: 2019 01 02    by: Wolf Machowitsch
# ------------------------------------------------------------------------------
# Revision Log:
# - 0.99   1999 03 08   Original Creation
#                       -
#
# - 1.00   1999 03 24   Production Release
#                       minor Code refinements, debugging
#
# - 1.10   2019 01 02   code review
#                       Code refinements
#
# ------------------------------------------------------------------------------
# Usage:
#     f_Round num parm1 [ int digits ]
#
#     Example:  f_Round 3.1415926 3      # rounds to 3 digits, giving 3.142
#               f_Round $var             # default for digits is zero, $var
#                                        # is rounded to an int
#
#
# Prerequisites:
# -   to use this function, the FPATH variable must be set
#
# -   functional dependencies: f_CheckNumeric()
#                              f_CheckInteger()
#
# ------------------------------------------------------------------------------
# Documentation:
#     f_Round() takes the first (integer) parameter and rounds it to the
#     number of digits AFTER the decimal point given in $2. If no $2 is
#     supplied the default value of 0 is assumed and $1 is rounded to an
#     integer.
#     The rounding is performed using the common algorithm of adding 5 to
#     the digit to the right of the digit to round and then truncating the
#     digits right to this.
#
#     Parameters: num parm1      a char representing a number
#                 int digits     an integer representing the numbers of
#                                decimals to remain after rounding
#
#     returns:    0: no error
#                 1: type error, parm1 not a number or $2 not an int
#                 2: internal error, no parameter supplied
#
# ------------------------------------------------------------------------------
# known bugs:
#
# -   it is not possible to round to digits before the decimal point
#
# ------------------------------------------------------------------------------
# ......................(C) 99 Wolf Machowitsch ................................
# ------------------------------------------------------------------------------

f_Round ()
{

$chFullDebug
                                                 # internal variables
typeset    nValue="$1"                           # value to round
typeset -i iDigits="$2"                          # number of digits
typeset -i iDigit2=0                             # for the rounding
typeset    nAdd="0."                             # for the rounding
typeset -i iCnt=0                                # general counter

if [ -z $iDigits ] ; then                        # set default for iDigits
     iDigits=0
fi

if [ $# -lt 1 ] ; then                           # parameter check
     return 2
else
     if [ $(f_CheckNumeric $nValue; print $?) -gt 0 ] ; then
           return 1
     fi
     if [ $(f_CheckInteger $iDigits; print $?) -gt 0 ] ; then
          return 1
     fi
fi
(( iDigit2 = iDigits + 1 ))

(( iCnt = 0 ))
while [ $iCnt -lt $iDigits ] ; do
     nAdd="${nAdd}0"
     (( iCnt += 1 ))
done
nAdd="${nAdd}5"

                                                 # calculate rounded value
nValue=$( print "scale=$iDigits; $( print "scale=$iDigit2; $nValue + $nAdd" |\
                                    bc \
                                  ) / 1" | \
          bc \
        )

print "$nValue"

return 0
}
# --- EOF f_Round

Sorry to correct you but: ksh93 indeed has floats as a data type and can do floating point calculations. It even sports trigonometric functions like sin(x), cos(x), etc.. Floats have to be defined (via typeset ) though and ksh88 has no floating point feature, though, but on RHEL the ksh is always a ksh93 (i am rather sure of that as ksh was put under a GPL-like license in 2005).

I hope this helps.

bakunin

1 Like

On top of what bakunin presented eloquently and exhaustively, some comments on your script:

  • it doesn't make sense to evaluate one single process from top 's output, as several processes can use up considerable CPU power, esp. if you are on a multiuser system, each of whom runs CPU intensive software. And, you seem to rely on the output being sorted by CPU%, which doesn't have to be the case - better control it ( -o option).

  • why run a handful of commands ( head , sed , tail , cat , ...) if using the powerful awk tool anyhow? Use it to do the entire thing!

  • top offers the numbers that bakunin alludes to (and that you might want to use) from the shelf, in its output's first line ( man top ):

So, one quite simple approach to your task might look like

if (LC_ALL=C top -bn1 | awk '{exit $10 > 0.9 }'); then echo "no problem"; else echo "send mail"; fi

To make sure that the locale of the system doesn't interfere, we use the C locale to run top in batch mode for one loop. awk then checks the one minute load average (field 10) - as we saw, the value to check against needs to be carefully chosen - and, depending on the comparison's result, exits with 0 , or 1 , which in turn can be evaluated by the shell to trigger the respective desired action. Be aware that shell and awk have reversed logical meanings of 0 and 1 .

3 Likes