I need to write a script capable of identifying when a high cpu utilitzation process. It sounds simple but we are on a AIX 5.3 environment with Virtual CPU's (VP's) and logical CPU's. Please any ideas or tips would be highly appreciated. Thanks.
check the C colmun of the ps output over several iterations and if the number is consitantly high(determined by what you think is high) then take action. you might also be able to run nmon in batch mode.
but what really is confusing me that even I take the CPU% and then add them all together the number I get is not near close to what I can see in Topas or nmon.
ok ok I might be a bit stupid here but what would I possibly need such a script for. Nmon percentage shows cpu% of a cpu thread - not absolute cpu. And how big such a thread is depends highly on how you did setup your virtuals ... if you have 1 virtual = one physical cpu - and only one virtual cpu in your box - than 87% might be high ... if you have 30 virtuals together worth one physical cpu, than 87% is rather low. What I would look after is rather indeed what is consuming high amounts of cpu for longer periods of time (ps aux is helpful), and do I exhaust my box regarding cpus (i.e. going constantly over my entitlement / virtual limits). Again - when I have 1 virtual cpu and 1 cpu entitled than 100% cpu is the max I can go to - if you have 30 virtuals than 30 cpus (aka 3000% is the limit). When that happens once a day for 30 sec - not a reason for concern if I am the rest of the day below 100% (aka my entitlement) What I would probably frequently look after is rather how my cpu is used ... high usr cpu = good, high sys cpu = bad - there are applications like sybase which are spinning cpus so the system appears to be incredibly busy - but in fact it is very busy doing nothing - and root cause is underutilization ...
Other root causes for high cpu utilization might be a bottleneck in IO or memory ... not necessarily a particular process at all.
To say it with the words of IBM : in a virtualized environment on big frames cpu is these days not your problem ... There might be at times runaway processes hogging cpu - but you will more likely capture them running nmon a few times a day interactively and just look into what is under the top cpu consumers though not expected to be.
I really appreciate your valuable comments. The only reason I'm trying to come up with a script to monitor the CPU Utilization is because we are in the process of deploying a DB monitor and due some bugs sometimes once in while we get these CPU hogs processes spinning around chewing all the CPU. It has taken so long from the vendor to fix this software bugs and also I don't really want to take any chances so I would rather setup something to alert me or execute the workaround which is basically restarting the agent runing on the LPAR. Thanks again.