CPU utlization

kumaran_5555 · March 31, 2010, 2:49am

Hi

I have server with 48 cpus running SUN os.

I have loaded with enough tasks. But I could see my CPU utilization is very less compared to kernal's cpu utlization.

below is the output of top command

load averages:  23.2,  10.1,  16.6;                    up 423+16:45:16                                                               12:15:10
781 processes: 750 sleeping, 31 on cpu
CPU states: 39.1% idle, 12.4% user, 48.6% kernel,  0.0% iowait,  0.0% swap
Memory: 192G phys mem, 148G free mem, 16G total swap, 16G free swap
   PID USERNAME LWP PRI NICE  SIZE   RES STATE    TIME    CPU COMMAND
 16042 cgi        1  20    0   27M   16M cpu/1    0:20  1.95% dcs_operation_m
 16357 cgi        1  20    0   27M   16M cpu/531   0:10  1.95% dcs_operation_m
 15738 cgi        1  20    0   27M   16M cpu/528   0:33  1.95% dcs_operation_m
 16125 cgi        1  20    0   27M   16M cpu/3    0:16  1.95% dcs_operation_m
 15981 cgi        1  20    0   27M   16M cpu/10   0:22  1.95% dcs_operation_m
 16046 cgi        1  20    0   27M   16M cpu/17   0:20  1.94% dcs_operation_m
 16127 cgi        1  20    0   27M   16M cpu/4    0:15  1.94% dcs_operation_m
 16128 cgi        1  20    0   27M   16M cpu/8    0:15  1.94% dcs_operation_m
 16745 cgi        1  20    0   27M   16M cpu/13   0:07  1.94% dcs_operation_m
 15822 cgi        1  20    0   27M   16M cpu/532   0:29  1.94% dcs_operation_m
 16748 cgi        1  20    0   27M   16M cpu/520   0:06  1.94% dcs_operation_m
 15939 cgi        1  30    0   27M   16M cpu/5    0:25  1.94% dcs_operation_m
 16045 cgi        1  20    0   27M   16M cpu/513   0:20  1.94% dcs_operation_m
 16683 cgi        1  20    0   27M   16M cpu/533   0:07  1.94% dcs_operation_m
 15980 cgi        1  20    0   27M   16M cpu/516   0:22  1.94% dcs_operation_m

By that we can see there were no I/O bound and swap issues.

Could anyone assist me on this.

Thanks

Kumaran R

pludi · March 31, 2010, 3:09am

I'll try to explain the percentages to the best of my knowledge:

idle: should be obvious
user: the time the processes run in user space. That is loops and all in-memory operations
kernel: the time spent in kernel space. Everything I/O runs in kernel space, that is: reading & writing files, network communication, user I/O (when not sleeping), loading/forking new processes, ...
iowait: the time waiting for an I/O device to come ready. As long as this is 0, none of your processes had to wait for a disk to come ready, or was blocked by a slow net connection.
swap: if this is zero it basically means none of your processes had to be swapped out, so your current memory installation is enough.

kumaran_5555 · March 31, 2010, 3:16am

Thanks for you explanations.

Could you please tell me why the process are using very less CPU(1.9%) though it has more room to use.

Kumaran R

pludi · March 31, 2010, 3:41am

Probably because it's a percentage of the total, instead of the percentage of a core. But I'm no Solaris expert, so I could be wrong.

kumaran_5555 · March 31, 2010, 4:23am

But another SunOS running the same application projects good (60%+) cpu usage.

Is there any way to improve this.

is it directly related to the thorughput ?

Kumaran R

druidmatrix · March 31, 2010, 4:45am

Your load averages look pretty high. Are you experiencing slow-down issues?

pludi · March 31, 2010, 5:01am

On a 48 core machine any load below 48 isn't high. With a load of 23, one could even go so far as to say it's bored half of the time.

Without knowing your setup and your application, and without being an expert in Solaris: no.

druidmatrix · March 31, 2010, 5:18am

If all the processes are processor and memory load balanced.....

I have no idea how you are saying that with 48 cores, an average below 23 is "it is bored half the time" - where are these numbers coming from?

pludi · March 31, 2010, 5:32am

From the definition of the "load". If nothing is running, the load is 0. Each process running or runnable adds 1 to the load, which then gets averaged over 1/5/15 minute(s). So a 1-minute load of 23.3 means that during the last minute there were on average 23.3 processes running or ready to run. Spread over 48 cores that's about half of them used, with the other half more or less idling away.

druidmatrix · March 31, 2010, 5:46am

That is not the definition of "load" at all. Here are my load balances:

bash-3.00# w
  2:46am  up 2 day(s), 10:33,  1 user,  load average: 0.00, 0.00, 0.00

So am I to understand nothing is running at all? Here are my running processes:

bash-3.00# prstat
   PID USERNAME  SIZE   RSS STATE  PRI NICE      TIME  CPU PROCESS/NLWP
  1163 root     3624K 3152K cpu0    49    0   0:00:00 0.2% prstat/1
   288 root     3120K 2624K sleep   49    0   0:00:00 0.0% bash/1
   192 root     2520K 1640K sleep  100    -   0:00:24 0.0% xntpd/1
   610 root       10M 5960K sleep   59    0   0:00:17 0.0% httpd/1
   233 daemon   7016K 6424K sleep   59    0   0:00:12 0.0% nfsmapid/4
   476 root     8160K 2784K sleep   59    0   0:00:09 0.0% sendmail/1
   128 root     7640K 4104K sleep   59    0   0:00:15 0.0% nscd/30
   236 root     2360K 1816K sleep   59    0   0:00:00 0.0% rpc.bootparamd/1
   229 daemon   2760K 2112K sleep   59    0   0:00:00 0.0% rpcbind/1
   253 root     2456K 1912K sleep   59    0   0:00:00 0.0% ttymon/1
   201 root     2864K 1664K sleep   59    0   0:00:00 0.0% cron/1
   109 root     3096K 2224K sleep   59    0   0:00:00 0.0% picld/4
   234 daemon   2768K 2320K sleep   59    0   0:00:00 0.0% statd/1
   122 daemon   4568K 2696K sleep   59    0   0:00:00 0.0% kcfd/3
     9 root       10M 9432K sleep   59    0   0:00:33 0.0% svc.configd/17
Total: 44 processes, 166 lwps, load averages: 0.00, 0.00, 0.00

Please note it is a 1 core machine.

kumaran_5555 · March 31, 2010, 5:53am

Exactly that is what we are experimenting.

We don't want to process jobs in paralel rather we want to finsh off them as soon as we revieved the job.

if i have loaded with 24 jobs will it two times faster in a finishing job compared 48 jobs?

Please help me on this

achenle · March 31, 2010, 6:05am

You should use "prstat" on Solaris instead of "top" if you want accurate numbers.

Also, 1.95% is awfully close to the 2+% that is going to be the max CPU utilization of any single-threaded process on a 48-CPU machine.

Is there a specific issue you're trying to solve?

druidmatrix · March 31, 2010, 6:09am

Kumaran, my opinion is that your load is too high - about time for a reboot!

---------- Post updated at 06:09 AM ---------- Previous update was at 06:06 AM ----------

Shhhhh.....you dont want to give it all away now do you? Keep us apart from those linux admins (I am one myself).....

kumaran_5555 · March 31, 2010, 6:21am

Okay I will explain my problem here.

Please help me out.

My server has to keep on processing the files as and when it arrives to the server. And the queue of pending shouldn't be long.

That is i have to finish processing them as soon as possible than processing many files at a time.

To achieve this which is preffered, loading until i reach load avg 48 or keeping it less than 48, so multiple cpus can perform a task to complete it easy.

I have tried keeping it higher than 48, it does parallel processing but it takes lot of to complete a single task. that is not i wanted.

Please suggest me.

I have SUNOs and 48 cpus.

Thanks
kumaran

druidmatrix · March 31, 2010, 6:24am

What do you mean by "load avg"? And when you quote specific numbers (somehow related to the number of CPU's in your server) what do you mean?

Are you expecting that your program (however number of processes it may spawn) will get distributed to different CPU's for processing without having to make specific compiler modifications to do so? If so, you are mistaken my friend.....

pludi · March 31, 2010, 6:41am

druidmatrix:

That is not the definition of "load" at all. Here are my load balances:

bash-3.00# w
  2:46am  up 2 day(s), 10:33,  1 user,  load average: 0.00, 0.00, 0.00

So am I to understand nothing is running at all? Here are my running processes:

bash-3.00# prstat
   PID USERNAME  SIZE   RSS STATE  PRI NICE      TIME  CPU PROCESS/NLWP
  1163 root     3624K 3152K cpu0    49    0   0:00:00 0.2% prstat/1
   288 root     3120K 2624K sleep   49    0   0:00:00 0.0% bash/1
   192 root     2520K 1640K sleep  100    -   0:00:24 0.0% xntpd/1
   610 root       10M 5960K sleep   59    0   0:00:17 0.0% httpd/1
   233 daemon   7016K 6424K sleep   59    0   0:00:12 0.0% nfsmapid/4
   476 root     8160K 2784K sleep   59    0   0:00:09 0.0% sendmail/1
   128 root     7640K 4104K sleep   59    0   0:00:15 0.0% nscd/30
   236 root     2360K 1816K sleep   59    0   0:00:00 0.0% rpc.bootparamd/1
   229 daemon   2760K 2112K sleep   59    0   0:00:00 0.0% rpcbind/1
   253 root     2456K 1912K sleep   59    0   0:00:00 0.0% ttymon/1
   201 root     2864K 1664K sleep   59    0   0:00:00 0.0% cron/1
   109 root     3096K 2224K sleep   59    0   0:00:00 0.0% picld/4
   234 daemon   2768K 2320K sleep   59    0   0:00:00 0.0% statd/1
   122 daemon   4568K 2696K sleep   59    0   0:00:00 0.0% kcfd/3
   9 root       10M 9432K sleep   59    0   0:00:33 0.0% svc.configd/17
Total: 44 processes, 166 lwps, load averages: 0.00, 0.00, 0.00

Please note it is a 1 core machine.

No, not nothing at all. Just not very much. By my definition, and please supply a better one if you have it, the base load is 0. Only running or runnable processes add to that, but not processes that are sleeping, eg because they're waiting for a timer to run out, or for user input.

Now, in your example you're showing a pretty idle machine (httpd with a total of 17 CPU seconds, while the machine is up for more than 2 days), with most processes waiting for some kind of external input or waking up occasionally (eg cron). That means the load counter isn't getting increased very much at each data point. And if the load is then averaged, the average seems to fall below 0.005 (because at that point it'd be rounded up to 0.01).

kumaran_5555 · March 31, 2010, 6:42am

I am not that expert.

Let me put it very simple.

I have 96 smilar processes to finish and i want them very fast.

which is the best way, by all that compiler parameters as default,

starting all 96 at a time or in batches of 48 or batchs of 24 ?

The time taken to process all 96 should be less.

Please explain me on this case

Regards
Kumaran

druidmatrix · March 31, 2010, 6:50am

True. To tell you the truth, I have not been able to make any sense of Sun's definition for either pre or post Solaris 10's load averages. In general, from experience, I have found that machines that get slow once the load averages start approaching digits (even 3-4 for me has shown noticeably slow performance) - now I have never had the experience of a 48 core machine, but have plenty of experience with Netra series quad core servers.

kumaran_5555 · March 31, 2010, 6:54am

Thanks Mr.druidmatrix

Do you think any one can help on this.

druidmatrix · March 31, 2010, 7:15am

I am not sure what you are asking me. But here are some pointers:
All your data is stored in the memory bank of the CPU that is running your program. Your program and all its spawned processes will be executed on a single CPU unless you compile it very specifically for multi-processor CPU's.

Generally the memory allocated for each CPU is by banks on the Sun boxes, and a particular CPU has a specific set of DIMM's dedicated to it. As long as your data structures (combined total) is not greater than the size of your memory bank (not counting the overhead space) you dont need to worry about it. In fact, when this space is exceeded, it will swap - so to the user this should be transparent. So you could potentially spwawn a thread for every single process you have.