Understanding & Monitoring CPU performance (Load vs SAR)

javanoob · June 4, 2016, 4:53am

Hi all,

Been reading a lot of the cpu load and its "analogy of it to car traffic path of expressway"

From wiki
Most UNIX systems count only processes in the running (on CPU) or runnable (waiting for CPU) states. However, Linux also includes processes in uninterruptible sleep states (usually waiting for disk activity)

q1) what about Solaris (I am on 10) ? Does load include (waiting for disk activity) ?

q2) On a 6 core server , for the past 1 minute -> what does it meant to have a

a) avg load of 6 -- using uptime;
b) cpu idle % of more then 70% ? -- using sar -u 10 6

I issue sar -u 10 6 && uptime (looking at its 1 minute load)

Does it meant that -> for the past 1 minute - my cpu/cores are fully utilized, but the tasks are so simple, the cpu only spent 30% of its time to execute them ?

Regards,
Noob

MadeInGermany · June 4, 2016, 5:58am

The load is actually the number of "overdue" threads.
Most obvious is the run queue, threads that are waiting for a free CPU.
But there are threads waiting for something else, e.g. for an I/O driver or another blocked thread. If in user land they show up with state D, but there are also kernel threads that can "pile up" and add to the load.

javanoob · June 4, 2016, 8:20am

Hi MadeInGermany,

Thanks for your reply and explanation.

So this bring us back to the question, does Solaris consider process waiting for I/O in (D state) as a load ?

Regards,
Noob

MadeInGermany · June 4, 2016, 9:23am

Yes.
And it's an explanation for having high load while CPU is not fully used.

jlliagre · June 4, 2016, 5:24pm

I beg to differ.

There is no D state (uninterruptible) state under Solaris.

My understanding is the S process state (waiting for an event to complete) is not considered as load under Solaris. One or more processes waiting for I/Os to complete do not use any CPU anyway.

Processes in the running and the waiting states (i.e. either using a CPU or runnable but waiting in the run queue) are the only one taken into account.

javanoob · June 5, 2016, 12:38am

Hi jlliagre, MadeInGermany

Thanks for all your replies. I believe in linux, IOWait, are actually contributing to the load ( though not reflecting in the run-q ) but Solaris might differ.

Notwithstanding the above, I am trying to understand the co-relationship between cpu-load and cpu utilization

q1) Can I say in a 1 cpu (no core, no hyperthread) environment ->
if the loadavg is 1 (for the last 1 minute) and that 1 thread isn't waiting for any I/O or any other stuff., I would expect to see a near 100% CPU utitlization as well for the last 1 minute?

q2) What could be the scenario whereby i have high amount of load average, but minimum CPU utilization ? (assuming there's no I/O wait).
Could it be the case whereby there are high amount of threads that require CPU time , but getting processed in a very very short time ?

Regards,
Noob

jlliagre · June 5, 2016, 4:41am

Yes, Linux is well known to include uninterruptible I/O in its load average calculation.

q1) The so called 1 min load average will tend to reach 1 but if the initial load was negligible, you'll need to wait for several minutes for it to get close to 1. It will be about 0.6 instead of 1 after one minute. Reciprocally, if the initial load was higher than one, you'll need to wait long enough (and likely more than 1 minute) to get close enough to it.

q2) The load average is derived from the run queue size which is sampled at 10 ms interval. The CPU load is computed from micro-state accounting with "exact" precision (i.e. several degrees of magnitude better, in the nanosecond range). A dtrace script should allow to figure out what is the cause of the discrepancy but in any case, the CPU utilization values are accurate, the average load is a rough approximation.

javanoob · June 5, 2016, 10:32am

jlliagre:

Yes, Linux is well known to include uninterruptible I/O in its load average calculation.

q1) The so called 1 min load average will tend to reach 1 but if the initial load was negligible, you'll need to wait for several minutes for it to get close to 1. It will be about 0.6 instead of 1 after one minute. Reciprocally, if the initial load was higher than one, you'll need to wait long enough (and likely more than 1 minute) to get close enough to it.

q2) The load average is derived from the run queue size which is sampled at 10 ms interval. The CPU load is computed from micro-state accounting with "exact" precision (i.e. several degrees of magnitude better, in the nanosecond range). A dtrace script should allow to figure out what is the cause of the discrepancy but in any case, the CPU utilization values are accurate, the average load is a rough approximation.

Hi jlliagre,

Thanks for your reply.

For q1) Yeap, when i run a sar -q 1 60 (for 1 minute average), the run queue's average is about 1+ , but my load avg for 1 minute (using uptime) shows only about 0.13.
Reading Brendan Gregg 's load average video - seems to talk about exponential decay of the load calculation (but i am no maths expert).
Thus, i will leave it as it is -> that having a load of 1 for a minute, will require more then 1 minute to be reflected in the " 1 minute load average " .

For q2) I am still confuse about the difference between CPU load and CPU utilization.

(on a 1 cpu - no multicore, or hyperthread computer)
If i have a continuous load of 1 for 1 minute, does that means my CPU utilization is near 100% / 0% idle for that 1 minute ?

q3) You mentioned that CPU load is sampled at 10ms interval.
How about the sampling interval for CPU utilization/time ?

In a nutshell, if i have 6 core cpu (12 thread total), and i have a average load of 3 most of the time;
Can i expect my CPU utilization to be around 3/12 * 100 = 25% (when the load is 3) ?

p.s. 1 last question -> does sar -q include thread currently running in cpu or only those runnable/ready in run queue ?

Regards,
Noob

jlliagre · June 5, 2016, 3:55pm

It might mean that, but not necessarily. Could be also ten processes all repeating this pattern : fighting to get the CPU during 1 second then idling for 9 seconds.

There is no sampling. The CPU utilization is accurately measured, not estimated.

That's only one eventuality.

The latter. A running thread is not waiting in a queue.

javanoob · June 5, 2016, 9:25pm

Hi Jilliagre,

Once again, thanks for your reply and truly appreciate your time.

q1) Do you mean the 10 processes fighting to get the CPU at the same time ?
I would then expect to see a load of 9 (in queue) + 1 (running) and have 100% CPU utilization for that 1 minute.

Am I right ?

q2) Can you elaborate on this further ? Why is it 1 ?

q3) Should hyperthreading be taken into consideration when measuring load ?

6 core ; point of saturation -> 6 (can take load up to 6) or
6 core but 12 thread; point of saturation -> 12 (can take load up to 12)

Regards,
Noob

jlliagre · June 6, 2016, 12:55am

No. The load would be 9+1 during the same 1 second when all processes compete then 0 during 9 seconds when all are idling so the average load would be 1. As you have only one core, the CPU utilization would be 10%.

See q1.

It should but an issue is depending on the kind of workload, the saturation level will vary. See for example CPU utilization of multi-threaded architectures explained (Solaris and Systems Information for ISVs)

javanoob · June 6, 2016, 11:26am

Hi Jlliagre,

Thank you so much for your reply. Really appreciate your guidance.

Please pardon me for my ignorance, but I still did not quite get the full picture or the maths behind this.

q1) Does that means that the 10 threads/load are actually completed within the 1st second (right before the 2nd second)

q2) Do you mean that the load average is calculated as an average of 1 second for the past 10 seconds ?
Hence (9+1) =10 load in the past 10 seconds ? (10load/10sec)
so its essentially 1 load / per sec, for the rest of the 60 secs/1 minute ?
--but i thought you mentioned earlier that the load is sample every 10ms and not 10 sec?

q3) My understanding is that I have 1 cpu/core.
It was utilized 100% on the 1st second of every 10 seconds.
In 1 minute, it would be utilized 5/60 second.
So the utilization for 1 minute is 5/60 * 100 = 8%.

How does it become 10% ?
Because the CPU is fully utilized for 1 sec in every 10 second = 1/10 *100 = 10% ?

Base on the above ->
Is both
a) the cpu utilization (1sec/10sec*100)
and
b) the cpu load ( (9+1)load / 10 sec) calculated per every 10second then ?

======

Please do bear with me if i seems totally off.
Hope to hear your advice soon.

Regards,
Noob

jlliagre · June 6, 2016, 4:10pm

Yes, that's what I wrote: ten processes all repeating this pattern : fighting to get the CPU during 1 second then idling for 9 seconds.

The load average is not really an average, there are three load average values maintained by the kernel, the 1min, the 5min and the 15min one. They are updated at worst every second.

The load is 10 during one second and zero for the remaining 9 seconds. The kernel function that computes the load average is smoothing this to a load of 1.

One unit of load in average for every second of the minute, including the first one.

The load average is updated every second from the run queue statistics which are sampled every 10 ms. (Note that I might be wrong here, it is well possible that starting from Solaris 10, the run queue is also computed from micro-state accounting instead of being sampled. That doesn't affect what we are talking about here).

There are six periods of 10 seconds in one minute, not five, hence 6/60*100=10%

javanoob · June 7, 2016, 2:07pm

Hi Jlliagre,

So sorry for the late reply. Was having a long day today.
All aside, truly appreciate your time and explanation; having your explanation beats me googling and reading all around.

Back to the topic

You are right . Seriously I do not know how did i ever derive that there are 5 period of 10 seconds in 1 minute -_-!

So in summary, can i make the following assumptions ->
A load of 1 in 1 minute -> might means

a) a simple scenario of an actual load of 1 every second and the CPU is 100% utilized for the whole of 1 minute

b) a load >1 in certain seconds that average out to be 1 load/second in a minute, and the ratio between load and CPU will not be a 1:1 as the CPU might be able to completed multiple threads/loads per second.

Thus
c) High load != High CPU
As shown in your example above, average load of 1 per minute (for 1 core/cpu) might have only 10% CPU utilization.

High CPU = High Load
If a CPU utilization is high, this means that the CPU time is being utilized / held by load in the system.

Regards,
Noob