Hello Folks,
Recently our FreeBSD 7.1 i386 system became very sluggish.
Nothing much is happening over there & whatever is running takes eternity to complete.
All the troubleshooting hinted towards a very high nice percentage.
Can that be the culprit?
Pasting snippets of top command, please advice whether it's a cause of concern & what are the possible remedies. Idle percentage mentions 0.
top
last pid: 32075; load averages: 4.11, 4.18, 4.38 up 17+21:11:14 23:11:55
136 processes: 5 running, 131 sleeping
CPU: 0.4% user, 85.9% nice, 11.7% system, 2.0% interrupt, 0.0% idle
Mem: 811M Active, 1767M Inact, 195M Wired, 92M Cache, 112M Buf, 129M Free
Swap: 16G Total, 480K Used, 16G Free
Thanks for the support.
Next thing to do is find out which processes eat the CPU cycles and why. What does ps auwx
say?
Not much of the CPU is being use, a compilation has been triggered.
Earlier it used to complete in ~3 hours which is now taking ~20 hours since last few days.
Here we go.
ps -auwx | head -15
USER PID %CPU %MEM VSZ RSS TT STAT STARTED TIME COMMAND
x 39055 10.0 6.0 196324 188288 ?? RN 11:34PM 0:04.64 [cc1plus]
x 39109 8.5 2.5 81324 76860 ?? RN 11:34PM 0:01.32 [cc1]
x 39113 8.5 2.1 67932 64280 ?? RN 11:34PM 0:01.31 [cc1]
x 39105 3.4 1.0 32612 29764 ?? RN 11:34PM 0:00.59 [cc1]
root 0 0.0 0.0 0 0 ?? DLs 20Mar14 0:02.39 [swapper]
root 1 0.0 0.0 1888 328 ?? ILs 20Mar14 0:00.19 /sbin/init --
root 2 0.0 0.0 0 8 ?? DL 20Mar14 0:32.19 [g_event]
root 3 0.0 0.0 0 8 ?? DL 20Mar14 11:25.58 [g_up]
root 4 0.0 0.0 0 8 ?? DL 20Mar14 10:16.40 [g_down]
root 5 0.0 0.0 0 8 ?? DL 20Mar14 0:00.00 [kqueue taskq]
root 6 0.0 0.0 0 8 ?? DL 20Mar14 0:00.00 [xpt_thrd]
root 7 0.0 0.0 0 8 ?? DL 20Mar14 0:00.00 [thread taskq]
root 8 0.0 0.0 0 8 ?? DL 20Mar14 0:00.00 [igb0 taskq]
root 9 0.0 0.0 0 8 ?? DL 20Mar14 0:00.00 [igb1 taskq]
So were the top
en ps
commands run at the same time? Because this adds up to 20% user/nice time whereas the top shows 86.3% user/nice time.
There was a gap.
Detail output of top with processes, this will suffice alone & won't require ps I believe.
last pid: 44000; load averages: 4.20, 4.21, 4.18 up 17+21:56:07 23:56:48
137 processes: 5 running, 132 sleeping
CPU: 0.8% user, 83.2% nice, 14.5% system, 1.6% interrupt, 0.0% idle
Mem: 1128M Active, 1517M Inact, 196M Wired, 94M Cache, 112M Buf, 60M Free
Swap: 16G Total, 480K Used, 16G Free
PID USERNAME THR PRI NICE SIZE RES STATE TIME WCPU COMMAND
43983 x 1 108 10 143M 135M RUN 0:03 11.18% cc1plus
43994 x 1 108 10 86428K 83804K RUN 0:02 10.89% cc1
43862 x 1 106 10 256M 252M RUN 0:10 2.78% cc1plus
43863 x 1 -8 10 15416K 13184K piperd 0:00 0.20% as
894 root 1 4 0 9324K 9352K select 62:29 0.00% amd
1039 root 1 4 0 11456K 3220K select 9:12 0.00% nmbd
80586 x 1 4 10 80952K 77828K select 4:42 0.00% bmake
1051 root 1 4 0 14816K 5508K select 2:21 0.00% winbindd
1126 root 1 4 0 14240K 7908K select 1:45 0.00% ruby
1008 root 1 4 0 4724K 1948K select 1:06 0.00% ntpd
1064 root 1 4 0 15868K 6312K select 0:58 0.00% winbindd
872 root 1 44 0 3392K 1408K select 0:49 0.00% rpcbind
1045 root 1 4 0 17072K 5780K select 0:48 0.00% smbd
Somebody is running large compile jobs, and was polite enough to nice them -- i.e. run them low-priority, so they won't steal time from anything more important.
If you aren't having performance problems, and they are authorized to use cc, I don't think this is a problem.
A run-queue of 4 needs 4 CPU cores to handle, otherwise speed is affected. For example, it needs 4 single-core 386 CPUs.
vmstat 2
shows the run queue (that is the major contributor to the load
shown by top
and uptime
).
Further it might show paging activity.
The compile jobs are periodic & run by us.
Same job used to complete in ~3 hrs earlier (a week back) which are now taking ~20 hrs.
No configuration change has happened on the system.
Don't see anything else hogging the system.
Hence wondering what's slowing it down.