LPAR CPU capacity planning

gsaray101 · May 7, 2013, 9:31am

Hi Everybody:

I am trying to come up with a formula to calculate the threshold for LPAR cpu utilization in a shared mode. It sounds like I need to calculate the Relative Virtual CPU and then try to calculate the cpu utilzation. Given the following lparstat output:

System configuration: type=Shared mode=Uncapped smt=4 lcpu=96 mem=393216MB psize=64 ent=16.00 
%user  %sys  %wait  %idle physc %entc  lbusy   app  vcsw phint
----- ----- ------ ------ ----- ----- ------   --- ----- -----
 35.4  15.0   11.1   38.5  9.18  57.4   30.2 36.58 25199  3727 
 36.4  15.6   11.3   36.7  9.41  58.8   31.1 35.56 24301  4129 
 43.2  18.6   13.1   25.1 11.28  70.5   35.0 32.56 30371  5777 
 41.4  17.0   13.2   28.3 10.75  67.2   32.8 33.34 30306  5116 
 35.4  15.0   11.3   38.3  9.21  57.6   29.5 35.85 26479  3758

How would I calculate the CPU threshold for an LPAR in share more?

MichaelFelt · May 8, 2013, 12:54pm

lbusy gives you the relative number of "logical" cpus busy. Since you are smt4 your have 96/4 virtual processors - 24 virtual (i.e. maximum physical processors at any moment).

Since your entitlement is 16 - the PHYP has reserved (declared as Home) two sockets of 8 cores each. The 8 extra virtual processors will run, ideally, on these home processors.

16 entitlement == 160 msec processing power every 10 msec guaranteed.

phsyc * 10 = # msec actually used (not processors!)
lbusy * 96 = average number of threads busy; if this number is nearly equal to physc, then you are running, mainly, single-threaded and you could easily reduce the number of VP assigned (to 'force' more utilization from a single processor (i.e., lbusy goes up faster than physc).

Hope this helps (I have to run to dinner )

gsaray101 · May 8, 2013, 2:56pm

I have to admit, this couldn't be more confusing. Do you know of any documentation that that explain this stuf.

MichaelFelt · May 9, 2013, 5:08am

Well, I will try and write it again. Trying to answer something complex just before dinner is poor timing. And this is complex stuff. I travel all around Europe to look at systems and explain how to modify configurations to increase overall system utilization. - read - not an easy subject for a forum - a "whitepaper" and/or a presentation is better.

However, read the man page and/or google on lbusy . The IBM infocenter should at least say something.

Also look at physc (which is actually a count of the msec used (/10), not the actual # of (virtual) processors used.

For your specific question I would like to ask you to look at the output of sar -P ALL - and see if it already answered for you.

Post your remaining questions, also include what this is for (if I understand your objective I can give a better answer, otherwise it tends to become technical (nerdy) mumbo-jumbo).

gsaray101 · May 9, 2013, 4:51pm

Thank you so much for helping me with this. What I am trying to do is get relative Usage percentages in share lpar environment so that it is very convenient to immediately know what is the usage, are we within capacity etc.

Here is the formulat that I am using to calculate the Relative CPU usage:

$Total = $USR + $SYS;
        if ( $EC > 100 ) 
          {
          $RelativeCores = $PC;
          $UsedCores = ($Total / 100) * $RelativeCores;
          $RelativePercent = $UsedCores / $Entitlement * 100
        else
          {
          $RelativeCores = $Entitlement;
          $RelativePercent = $Total;
          $UsedCores = ($Total / 100) * $Entitlement;
          }

With the above formulas, I am calculating the Relative CPU usage in share lpar environment. So, If I wanted to calculate the threshold on relative percentage, what would be the formula.

Eventually, I would like to create a relative cpu percent chart, put a horizontal line for the threshold and looking at the chart, I would quickly know where am I at with the CPU. Does this make sense?

MichaelFelt · May 13, 2013, 2:40pm

I do not think this is going to give you a statistic you really want, but I may be mistaken - as it all depends on what you are trying to "relate" to/with each other.

The physc ($PC) value is already "relative" in the sense that you are computing it (I think) because it is an expression of the processing milliseconds used for the time period (9.1 means 91 msec per 10 msec - which is the PHYP real-time scheduling window - entitlement is guaranteed processing - if requested -, in real terms: (EC * 10) msec per 10 msec.

So if I use 91 msec - that might be 10 processors (9 running non-stop for 10msec, and one (the tenth) running only 1msec, or it could be 91 processors all running only 1 msec.

Looking at user/sys time and comparing them to physc could make sense on Power6 and earlier - where one thread running user+sys = 100 could equal physc = 1.0, but on POWER7 a single thread is considered to only be 0.66 of 1.0 while the other three threads (logical cpu 1,2,3 = even though idle is 100% are considered to be "using" .11 physc each - because there are additional processing components on a Power7 that, by definition, are not being used. In other words, it is impossible for a single thread to fully utilize a POWER7 processor potential.

In short, I think the statistic to use is just physc. You could perhaps give it a weight by multiplying it by lbusy% - but this depends on what you are trying to make "standardized".

Hope this helps (i.e. is understandable)!

Michael