Problem is, if your machine is constantly using 100% of CPU (you should have atleast 5% for system).
In general, you need to find out what is being ran @ that time you saw a spike of CPU usage (database layer if db is used).
The tools for developers are : tusc, gdb and execution plans (db)
The tools for system folks are : sar, lsof, iostat, vmstat, glance
I would like to emphasize sar here, as a most useful tool (at least for me) for constant monitoring on HPUX systems. Have it run in cron collecting data all the time.
Something like :
0 * * * * /usr/lbin/sa/sa1 300 12
Be sure to inspect the /usr/lbin/sa/sa1 script and select where you want the actual log ( /var/adm/sa being default)
Then you will be able to inspect the files generated using sar -A -f /var/admsa/sadate
Check man sar for additional switches, there is plenty for everything.
There is also free software which will enable you to draw graphs from those files, or use awk to draw some conclusions.
Thanks all for the reply. As per your feedbacks I have sent along the ouput of sar command to them.
However my higher management is asking the reason for sudden spike in cpu usage,so wanted to check if that can be checked from system side (i.e why process took higher cpu during that particular time) or it should be checked from application team...
I you cant tell us what processes, we cant give you answers...
When the system is the cause, without correct information of the model, system configuration ( CPUs, memory, kernel parameters, swap config, using LVM/JFS? what JFS paramaters are set etc...) you cant give a pertinent answer and if application, only application team can help you... If system, knowing the configuration of the system, we can help you only if you can diagnose correctly the cause, in other words the least would be describing correctly what is happening, when, how long, and how systematic...
It was the one of process that Application calls(OrderEntry --- for Completing the registration of the subscriber in our ERP system ),now basically my management wants me to diagnose why that registartion process took much CPU.
So my question is will this be a part of sys admin activity or application admin only can provie details about the cause since being system admin I will not be knowing the code of Application
My Operating System Version is
HP-UX prod B.11.11 U 9000/800
---------- Post updated at 03:50 AM ---------- Previous update was at 03:49 AM ----------
To Add more info, system is having 32 CPUs and 40G of physical RAM.
How did the management know about the spike? Are you running a performance monitoring package, and if so, which one?
One process using 100% of one CPU for a short period is usually totally harmless. If you get processes waiting for CPU, then there may be a sizing, tuning or programming issue.
I have seen systems will all CPUs running close to 100% and no significant wait states. That just meant that they bought the right size of computer.
TYPE AVAIL USED FREE USED LIMIT RESERVE PRI NAME
dev 16384 0 16384 0% 0 - 1 /dev/vg00/lvol2
dev 30016 0 30016 0% 0 - 1 /dev/vg00/secswap
Since we dont know what external storage you use (if its the case...), I will figure out that your swap devices were not dreated at the same time and are here to correct memory access trouble of the beginning.
What will happen if you were to use the swap now? It will take the biggest first...
The impact? Your biggest is huge with only one access (unless its a multidisk striped device...) and two devices on the same root disks is not a good idea...
About CPU, Methyl and I have same way of seeing things: Not knowing what is running at what moment and for how long, you CANNOT diagnose what you have as an issue, Im tempted to say your thresholds are badly set...
I never had any serious trouble with the HP servers I managed at the time for I knew what was running on them and could ( and NOT a software ) see just by leaving glance running and looking time to time if things where normal or not... some strong calculations of stats can take 100%CPU during days (Yes! days...) a RDBMS transaction lasting hours is very suspicious but only the DBA can tell if normal or not
One classical issue figure is the box that "freezes" periodically with 100% CPU and I/O...
Your kernel has some huge values for Semaphores and Shared Memory, but only 5% of memory allocated to disc buffers (suggesting that your database engine is doing most of the disc buffering). What Database Engine are your running?
Assuming that your Nimsoft monitoring package can't tell you what you want to know, visually monitoring with HP_UX Glance at the time that the suspect application is running should help spot an abnormal process with view to more detailed monitoring with database statistics (assuming that you have a mainstream database engine).
Looking at the Nimsoft website (I don't know the product myself) I see that it has a dashboard for "Processor Queue". This is the most important figure in the context of CPU usage.