Very high nice percentage in top command

vibhor_agarwali · April 7, 2014, 2:13am

Hello Folks,

Recently our FreeBSD 7.1 i386 system became very sluggish.
Nothing much is happening over there & whatever is running takes eternity to complete.

All the troubleshooting hinted towards a very high nice percentage.
Can that be the culprit?
Pasting snippets of top command, please advice whether it's a cause of concern & what are the possible remedies. Idle percentage mentions 0.

top
last pid: 32075;  load averages:  4.11,  4.18,  4.38                                                                                                       up 17+21:11:14  23:11:55
136 processes: 5 running, 131 sleeping
CPU:  0.4% user, 85.9% nice, 11.7% system,  2.0% interrupt,  0.0% idle
Mem: 811M Active, 1767M Inact, 195M Wired, 92M Cache, 112M Buf, 129M Free
Swap: 16G Total, 480K Used, 16G Free

Thanks for the support.

Scrutinizer · April 7, 2014, 2:19am

Next thing to do is find out which processes eat the CPU cycles and why. What does ps auwx say?

vibhor_agarwali · April 7, 2014, 2:36am

Not much of the CPU is being use, a compilation has been triggered.
Earlier it used to complete in ~3 hours which is now taking ~20 hours since last few days.
Here we go.

ps -auwx | head -15
USER          PID %CPU %MEM   VSZ   RSS  TT  STAT STARTED      TIME COMMAND
x            39055 10.0  6.0 196324 188288  ??  RN   11:34PM   0:04.64 [cc1plus]
x            39109  8.5  2.5 81324 76860  ??  RN   11:34PM   0:01.32 [cc1]
x            39113  8.5  2.1 67932 64280  ??  RN   11:34PM   0:01.31 [cc1]
x            39105  3.4  1.0 32612 29764  ??  RN   11:34PM   0:00.59 [cc1]
root            0  0.0  0.0     0     0  ??  DLs  20Mar14   0:02.39 [swapper]
root            1  0.0  0.0  1888   328  ??  ILs  20Mar14   0:00.19 /sbin/init --
root            2  0.0  0.0     0     8  ??  DL   20Mar14   0:32.19 [g_event]
root            3  0.0  0.0     0     8  ??  DL   20Mar14  11:25.58 [g_up]
root            4  0.0  0.0     0     8  ??  DL   20Mar14  10:16.40 [g_down]
root            5  0.0  0.0     0     8  ??  DL   20Mar14   0:00.00 [kqueue taskq]
root            6  0.0  0.0     0     8  ??  DL   20Mar14   0:00.00 [xpt_thrd]
root            7  0.0  0.0     0     8  ??  DL   20Mar14   0:00.00 [thread taskq]
root            8  0.0  0.0     0     8  ??  DL   20Mar14   0:00.00 [igb0 taskq]
root            9  0.0  0.0     0     8  ??  DL   20Mar14   0:00.00 [igb1 taskq]

Scrutinizer · April 7, 2014, 2:46am

So were the top en ps commands run at the same time? Because this adds up to 20% user/nice time whereas the top shows 86.3% user/nice time.

vibhor_agarwali · April 7, 2014, 2:57am

There was a gap.
Detail output of top with processes, this will suffice alone & won't require ps I believe.

last pid: 44000;  load averages:  4.20,  4.21,  4.18                                                                                                       up 17+21:56:07  23:56:48
137 processes: 5 running, 132 sleeping
CPU:  0.8% user, 83.2% nice, 14.5% system,  1.6% interrupt,  0.0% idle
Mem: 1128M Active, 1517M Inact, 196M Wired, 94M Cache, 112M Buf, 60M Free
Swap: 16G Total, 480K Used, 16G Free

  PID USERNAME         THR PRI NICE   SIZE    RES STATE    TIME   WCPU COMMAND
43983 x                   1 108   10   143M   135M RUN      0:03 11.18% cc1plus
43994 x                   1 108   10 86428K 83804K RUN      0:02 10.89% cc1
43862 x                   1 106   10   256M   252M RUN      0:10  2.78% cc1plus
43863 x                   1  -8   10 15416K 13184K piperd   0:00  0.20% as
  894 root               1   4    0  9324K  9352K select  62:29  0.00% amd
 1039 root               1   4    0 11456K  3220K select   9:12  0.00% nmbd
80586 x                  1   4   10 80952K 77828K select   4:42  0.00% bmake
 1051 root               1   4    0 14816K  5508K select   2:21  0.00% winbindd
 1126 root               1   4    0 14240K  7908K select   1:45  0.00% ruby
 1008 root               1   4    0  4724K  1948K select   1:06  0.00% ntpd
 1064 root               1   4    0 15868K  6312K select   0:58  0.00% winbindd
  872 root               1  44    0  3392K  1408K select   0:49  0.00% rpcbind
 1045 root               1   4    0 17072K  5780K select   0:48  0.00% smbd

Corona688 · April 7, 2014, 12:09pm

Somebody is running large compile jobs, and was polite enough to nice them -- i.e. run them low-priority, so they won't steal time from anything more important.

If you aren't having performance problems, and they are authorized to use cc, I don't think this is a problem.

MadeInGermany · April 7, 2014, 12:23pm

A run-queue of 4 needs 4 CPU cores to handle, otherwise speed is affected. For example, it needs 4 single-core 386 CPUs.

vmstat 2

shows the run queue (that is the major contributor to the load shown by top and uptime ).
Further it might show paging activity.

vibhor_agarwali · April 8, 2014, 2:07am

The compile jobs are periodic & run by us.
Same job used to complete in ~3 hrs earlier (a week back) which are now taking ~20 hrs.
No configuration change has happened on the system.

Don't see anything else hogging the system.
Hence wondering what's slowing it down.