Sun: High kernel usage & very high load averages

Hi,

I am seeing very high kernel usage and very high load averages on my system (Although we are not loading much data to our database). Here is the output of top...does anyone know what i should be looking at?

Thanks,

Lorraine

last pid: 13144; load averages: 22.32, 19.81, 16.78 09:23:50
165 processes: 148 sleeping, 12 running, 1 zombie, 4 on cpu
CPU states: 0.2% idle, 75.5% user, 24.2% kernel, 0.1% iowait, 0.0% swap
Memory: 12G real, 185M free, 17G swap in use, 603M swap free

PID USERNAME LWP PRI NICE SIZE RES STATE TIME CPU COMMAND
6653 informix 2 51 -10 5758M 4835M cpu/6 513:14 8.35% oninit
6652 informix 2 51 -10 5759M 4841M run 556:36 8.25% oninit
6654 informix 2 50 -10 5758M 4830M run 487:37 8.21% oninit
6655 informix 2 50 -10 5758M 4824M run 470:20 8.07% oninit
6633 informix 2 51 -10 5759M 4841M run 913:09 7.28% oninit
15233 metrica 1 44 2 16M 15M run 35:23 5.66% perl
12897 metrica 1 36 2 9968K 9080K cpu/2 0:57 5.14% perl
12496 metrica 1 30 0 1136K 880K run 1:52 4.93% tar
12913 metrica 1 36 2 11M 10M run 0:56 4.74% perl
6647 informix 2 59 -20 5757M 5088M sleep 43.4H 4.64% oninit
6648 informix 2 59 -20 5757M 5084M sleep 35.6H 3.71% oninit
11260 metrica 4 40 0 3946M 557M run 268:29 3.04% metrica_load
12922 metrica 1 60 2 11M 11M sleep 0:49 2.71% perl
6649 informix 2 59 -20 5757M 5090M sleep 27.4H 2.68% oninit
6650 informix 2 59 -20 5757M 5057M sleep 16.9H 2.08% oninit

It appears that you have a lot of context switching - that is why the kernel is active.

You may want to look at how priorities are set on the processes that are getting moved in/out. If the processes are not stuck in a loop, you can clear the traffic by letting one or two processes get through a little faster.

Your system does not appear to be I/O bound, so it has to be CPU contention.

FWIW - It also looks like your swap is pretty close to being maxed out as well, like 95% of it is used.

Hi,

I cleared up the swap issue being 95% full.

How do I check how priorities are set? Is this a kernel parameter? I have never seen this much activity in the kernel before, what can cause this CPU contention? There is only a small load on the system.

Thanks a million

I strongly suspect this server needs more memory. If you look at the IO Wait it is very small, meaning the I/O isn't causing problems. But the free memory is only a few percent of the total, meaning you are out of memory. The system having to move chunks of data between main memory and swap is what is driving your CPU usage through the roof. If you get more memory, it should solve the problem since that swapping can stop.

One way to verify that is to use sar -g to check how much paging activity is going on. Here is an example from my box.

krypton$ sar -g 5 5

SunOS krypton 5.10 Generic_118822-02 sun4u 02/06/2006

10:18:34 pgout/s ppgout/s pgfree/s pgscan/s %ufs_ipf
10:18:39 0.00 0.00 0.00 0.00 0.00
10:18:44 0.00 0.00 0.00 0.00 0.00
10:18:49 0.00 0.00 0.00 0.00 0.00
10:18:54 0.00 0.00 0.00 0.00 0.00
10:18:59 0.00 0.00 0.00 0.00 0.00

Average 0.00 0.00 0.00 0.00 0.00
krypton$

Krypton isn't heavily loaded and has plenty of free memory, so there is no paging or swapping going on at all. If your box shows non-zero numbers here it is out of memory and having to swap. Occasional non-zero is ok as it may just be moving old data out of memory, but if it is constantly a high number it is a problem. My guess is that is what you'll see.

Yes you are correct - the values of p/in and p/outs are very high on my server.

metrica@metrica # sar -g 5 5

SunOS metrica 5.8 Generic_117350-11 sun4u 02/06/06

14:25:10 pgout/s ppgout/s pgfree/s pgscan/s %ufs_ipf
14:25:15 12.25 42.49 38.14 0.00 0.00
14:25:20 10.76 29.75 26.42 0.00 0.00
14:25:25 8.38 18.56 18.56 0.00 0.00
14:25:30 1.79 21.27 21.27 0.00 0.00
14:25:35 2.59 4.58 4.58 0.00 0.00

Average 7.17 23.38 21.84 0.00 0.00

But this server has 12G of memory. This should be plenty considering what we are running on the system, so i dont think i can request to purchase more memory. Should i look to see if informix or some other processes are grabbing on the memory on the box?

Thanks ever so much for your guidance.