Script to identify high CPU usage processes

Hi Guys,

I need to write a script capable of identifying when a high cpu utilitzation process. It sounds simple but we are on a AIX 5.3 environment with Virtual CPU's (VP's) and logical CPU's. Please any ideas or tips would be highly appreciated. Thanks.

Harby.

check the C colmun of the ps output over several iterations and if the number is consitantly high(determined by what you think is high) then take action. you might also be able to run nmon in batch mode.

but what really is confusing me that even I take the CPU% and then add them all together the number I get is not near close to what I can see in Topas or nmon.

You could build something around the following command (read the man-page of "ps" for an explanation of the options used:

ps -Alo pcpu,vsz,pid,args | sort -rn

Basically this lists CPU consumption (pcpu) and virtual memory consumption (vsz) for each process and sorts for CPU consumption.

Again: read the man page of ps and that probably becomes clearer. You might also want to read Demystifying IOWAIT and AIX Process Priority and Control.

I hope this helps.

bakunin

I use a command such as:

ps -eF "%C %u %n %p %a" | grep -vE "^ *0\.0|defunct|CPU|high_cpu|init"

If I remember right it won't match nmon because ps will list the CPU usage since the process started - not the instantaneous CPU usage.

Example output:

>high_cpu

=======================================================================================================================================
/home/unxsa/bin/high_cpu started at Wed Nov  3 07:24:34 CDT 2010 on ms.
=======================================================================================================================================
CPU    User       Nice   CPU  PID        Command
---------------------------------------------------------------------------------------------------------------------------------------
10.0   root       24     -    1147122    /usr/java5/jre/bin/java -Dhcdaemon -Dhmc -Djava.class.path=/opt/csm/codebase:/opt/freeware
0.2    root       20     -    307414     /usr/bin/dsmc sched
---------------------------------------------------------------------------------------------------------------------------------------

=======================================================================================================================================
/home/unxsa/bin/high_cpu ended at Wed Nov  3 07:24:34 CDT 2010 on ms.
=======================================================================================================================================

ok ok I might be a bit stupid here but what would I possibly need such a script for. Nmon percentage shows cpu% of a cpu thread - not absolute cpu. And how big such a thread is depends highly on how you did setup your virtuals ... if you have 1 virtual = one physical cpu - and only one virtual cpu in your box - than 87% might be high ... if you have 30 virtuals together worth one physical cpu, than 87% is rather low. What I would look after is rather indeed what is consuming high amounts of cpu for longer periods of time (ps aux is helpful), and do I exhaust my box regarding cpus (i.e. going constantly over my entitlement / virtual limits). Again - when I have 1 virtual cpu and 1 cpu entitled than 100% cpu is the max I can go to - if you have 30 virtuals than 30 cpus (aka 3000% is the limit). When that happens once a day for 30 sec - not a reason for concern if I am the rest of the day below 100% (aka my entitlement) :slight_smile: What I would probably frequently look after is rather how my cpu is used ... high usr cpu = good, high sys cpu = bad - there are applications like sybase which are spinning cpus so the system appears to be incredibly busy - but in fact it is very busy doing nothing - and root cause is underutilization ...
Other root causes for high cpu utilization might be a bottleneck in IO or memory ... not necessarily a particular process at all.
To say it with the words of IBM : in a virtualized environment on big frames cpu is these days not your problem ... There might be at times runaway processes hogging cpu - but you will more likely capture them running nmon a few times a day interactively and just look into what is under the top cpu consumers though not expected to be.

Kind regards
zxmaus

Hi Everyone,

I really appreciate your valuable comments. The only reason I'm trying to come up with a script to monitor the CPU Utilization is because we are in the process of deploying a DB monitor and due some bugs sometimes once in while we get these CPU hogs processes spinning around chewing all the CPU. It has taken so long from the vendor to fix this software bugs and also I don't really want to take any chances so I would rather setup something to alert me or execute the workaround which is basically restarting the agent runing on the LPAR. Thanks again.