The old golden Question - Cpu load vs utilization

javanoob · January 7, 2017, 3:20am

Hi all,

Load = run queue, process utilizing cpu or waiting for cpu
Cpu utilization = % of time that the cpu is busy.

Naturally, I am thinking that if I have 1 cpu and my load=1 all the time, my CPU is 100% busy.

Now I have 2 CPU thread running and doing prstat -Z, this is what I saw

 
 ZONEID    NPROC  SWAP   RSS MEMORY      TIME  CPU    ZONE
     3           164       11G   11G   8.7%      0:46:43 8.8% xzone1
 Total: 164 processes, 617 lwps, load averages: 1.43, 1.29, 1.30

I have a load of ard >1 for the past 15 minutes, it means I have 1 out of the 2 available CPUs busy all the time -> isn't it ?

But my CPU% is only 8.8%, and not 50%..

What does this imply, I have run queue of 1 on average, but they are executed too fast (taking too little %cputime ?) ?

But the load shows that I have avg >1 for the past 15min, this means I am utilizing 50% (1 out of 2 cpu available ) of the cputime all the time isn't it ?

This is draining my brain juice.

Regards,
Noob

jim_mcnamara · January 7, 2017, 10:10am

As shown -- You do not have a cpu resource problem. Period.
Cpu queues may (note the "may") be caused by a lot of process context switching. LWP's (Solaris threads) can do this. I think that is your issue. But I do no think you have a real problem. Yet.

So, if you want to learn about your system context switches try dtrace using cswstat.d I am definitely not saying this is your perceived issue, but dtrace is your best window into the kernel. And it looks like you want to learn.

Books like this one are very much in order for learning dtrace :
DTrace: Dynamic Tracing in Oracle Solaris, Mac OS X and FreeBSD: Brendan Gregg, Jim Mauro: 9780132091510: Amazon.com: Books

Ok. If your system has problems, always consider starting your problem solving with I/O. I/O request queue length for example. You may have one disk device that is being hammered. Or the device has problems.

iostat is your friend here. So is dtrace :
Tutorial: DTrace by Example

Checkout the dtrace toolkit.

jlliagre · January 8, 2017, 3:17am

You seem to be running these commands from a non global zone. You are missing CPU usage generated in other zones, including the global one. Monitoring need to be done from the global zone, as long as dtrace analysis.

That is an oversimplification. CPU load is indeed based on measures of the run queue size and the number of running processes, but as it is a floating average, a load of 1 doesn't necessarily mean one CPU is 100% busy.
CPU utilization might be smaller if for example, two processes are competing for the same CPU 50% of the time and then are idle for the remaining 50% of the time. You'll have a load of 1 and a CPU 50% busy.
Should you have 10 processes competing for a single CPU 10% of the time, then idling for 90% of the time, you will still be observing a load of 1 but a CPU 10% busy.

jim_mcnamara · January 8, 2017, 11:37pm

jlliagre may be correct. If you ran the command in the global zone versus ran in the zone itself then the interpretation is different. As he stated. where did you run prstat?

dtrace should be run in the root (global) zone.