CPU Usage

Hi Experts,

We encountered problem in one of the production system where some processes were taking higher CPU and after some time it came back to normal..

From system prespective, is there any way to check why processes took more cpu during that particular period.

What processes are you talking about?
To see why is to be monitoring constantly... without more information from you it will be difficult to answer...

From system perspective, no. It's not psychic. Only the process itself "knows" what it's actually doing.

Maybe you can try tusc.
tusc-8.1(1)

There is nothing wrong with processes using CPU :smiley:

Problem is, if your machine is constantly using 100% of CPU (you should have atleast 5% for system).

In general, you need to find out what is being ran @ that time you saw a spike of CPU usage (database layer if db is used).

The tools for developers are : tusc, gdb and execution plans (db)
The tools for system folks are : sar, lsof, iostat, vmstat, glance

I would like to emphasize sar here, as a most useful tool (at least for me) for constant monitoring on HPUX systems. Have it run in cron collecting data all the time.

Something like :

0       *       *       *       *       /usr/lbin/sa/sa1 300 12

Be sure to inspect the /usr/lbin/sa/sa1 script and select where you want the actual log ( /var/adm/sa being default)

Then you will be able to inspect the files generated using sar -A -f /var/admsa/sadate
Check man sar for additional switches, there is plenty for everything.

There is also free software which will enable you to draw graphs from those files, or use awk to draw some conclusions.

Regards
Peasant.

And first of all, is there a problem and why/how?

Thanks all for the reply. As per your feedbacks I have sent along the ouput of sar command to them.

However my higher management is asking the reason for sudden spike in cpu usage,so wanted to check if that can be checked from system side (i.e why process took higher cpu during that particular time) or it should be checked from application team...

I you cant tell us what processes, we cant give you answers...
When the system is the cause, without correct information of the model, system configuration ( CPUs, memory, kernel parameters, swap config, using LVM/JFS? what JFS paramaters are set etc...) you cant give a pertinent answer and if application, only application team can help you... If system, knowing the configuration of the system, we can help you only if you can diagnose correctly the cause, in other words the least would be describing correctly what is happening, when, how long, and how systematic...

Hi Vbe..

Thanks for the reply...

It was the one of process that Application calls(OrderEntry --- for Completing the registration of the subscriber in our ERP system ),now basically my management wants me to diagnose why that registartion process took much CPU.

So my question is will this be a part of sys admin activity or application admin only can provie details about the cause since being system admin I will not be knowing the code of Application

My Operating System Version is
HP-UX prod B.11.11 U 9000/800

---------- Post updated at 03:50 AM ---------- Previous update was at 03:49 AM ----------

To Add more info, system is having 32 CPUs and 40G of physical RAM.

Kernel parameters are attached herewith and LVM is used...

Do you mean it took for a time ( how long?) all 32 CPUs?

What is the output of model command?
And what does swapinfo -tam give ou?
How are configured your disks? e.g. "external SAN using 4 FC HBAs"

Repeating the question will not get you a better answer.

The system cannot tell you that. You'll have to investigate the application.

How did the management know about the spike? Are you running a performance monitoring package, and if so, which one?

One process using 100% of one CPU for a short period is usually totally harmless. If you get processes waiting for CPU, then there may be a sizing, tuning or programming issue.

I have seen systems will all CPUs running close to 100% and no significant wait states. That just meant that they bought the right size of computer.

1 Like

Yes Methyl,..we are using NimSoft for performance monitoring,so email is received as soon as the system resources are used beyond threshold.

Multiple instances of OrderEntry process was running at that time, each one utilizing almost one CPU each.

 
prod:root:/# model
9000/800/SD32B
prod:root:/#
 
 
prod:root:/# swapinfo -tam
             Mb      Mb      Mb   PCT  START/      Mb
TYPE      AVAIL    USED    FREE  USED   LIMIT RESERVE  PRI  NAME
dev       16384       0   16384    0%       0       -    1  /dev/vg00/lvol2
dev       30016       0   30016    0%       0       -    1  /dev/vg00/secswap
dev       51200       0   51200    0%       0       -    1  /dev/vg01/lvswap
reserve       -   17835  -17835
memory    31082    3344   27738   11%
total    128682   21179  107503   16%       -       0    -
prod:root:/#
 

About your swap configuration:

TYPE      AVAIL    USED    FREE  USED   LIMIT RESERVE  PRI  NAME
dev       16384       0   16384    0%       0       -    1  /dev/vg00/lvol2
dev       30016       0   30016    0%       0       -    1  /dev/vg00/secswap

Since we dont know what external storage you use (if its the case...), I will figure out that your swap devices were not dreated at the same time and are here to correct memory access trouble of the beginning.
What will happen if you were to use the swap now? It will take the biggest first...
The impact? Your biggest is huge with only one access (unless its a multidisk striped device...) and two devices on the same root disks is not a good idea...
About CPU, Methyl and I have same way of seeing things: Not knowing what is running at what moment and for how long, you CANNOT diagnose what you have as an issue, Im tempted to say your thresholds are badly set...
I never had any serious trouble with the HP servers I managed at the time for I knew what was running on them and could ( and NOT a software ) see just by leaving glance running and looking time to time if things where normal or not... some strong calculations of stats can take 100%CPU during days (Yes! days...) a RDBMS transaction lasting hours is very suspicious but only the DBA can tell if normal or not
One classical issue figure is the box that "freezes" periodically with 100% CPU and I/O...

Your kernel has some huge values for Semaphores and Shared Memory, but only 5% of memory allocated to disc buffers (suggesting that your database engine is doing most of the disc buffering). What Database Engine are your running?

Assuming that your Nimsoft monitoring package can't tell you what you want to know, visually monitoring with HP_UX Glance at the time that the suspect application is running should help spot an abnormal process with view to more detailed monitoring with database statistics (assuming that you have a mainstream database engine).

Looking at the Nimsoft website (I don't know the product myself) I see that it has a dashboard for "Processor Queue". This is the most important figure in the context of CPU usage.

1 Like