vmstat returns good val for cpuIdle put ps shows no active process

hi
i'm running a shell script that checks the amount of cpu idle either using /usr/bin/vmstat 1 2 or sar 1 2 (on unixware) before i run some tests(if cpu idle greater than 89 I run them).

These tests are run on many platforms, linux(suse, redhat) hp-ux, unixware, aix, solaris, tru64.

When the test for cpu idle fails I do a

ps -e -o user,stime,pcpu,pid,ppid,time,tty,args

to try to find out what process is hogging the cpu, at best the output does not clearly show which process is affecting it or at worst does not return anything other than 0.0 values for %CPU for all processes.

eg - vmstat returned 0 cpu idle but ps output was

========================================================
USER STIME %CPU PID PPID TIME TT COMMAND
root Feb28 0.0 1 0 00:00:04 ? init [5]
root Feb28 0.0 2 1 00:00:00 ? [ksoftirqd/0]
root Feb28 0.0 3 1 00:00:00 ? [events/0]
root Feb28 0.0 4 3 00:00:00 ? [khelper]
root Feb28 0.0 5 3 00:00:00 ? [kacpid]
root Feb28 0.0 34 3 00:00:00 ? [kblockd/0]
root Feb28 0.0 56 3 00:00:00 ? [aio/0]
root Feb28 0.0 55 1 00:03:38 ? [kswapd0]
root Feb28 0.0 1383 1 00:00:00 ? [kseriod]
root Feb28 0.0 2159 3 00:00:00 ? [ata/0]
root Feb28 0.0 2161 1 00:00:00 ? [scsi_eh_0]
root Feb28 0.0 2162 1 00:00:00 ? [scsi_eh_1]
root Feb28 0.0 2185 3 00:00:25 ? [reiserfs/0]
root Feb28 0.0 3554 1 00:00:00 ? [khubd]
root Feb28 0.0 3967 1 00:00:00 ? /sbin/syslogd -a /var/lib/dhcp/dev/log -a /var/lib/named/dev/log -a /var/lib/ntp/dev/log -a /var/lib/stunnel/dev/log
root Feb28 0.0 3971 1 00:00:00 ? /sbin/klogd -c 1 -2 -x
root Feb28 0.0 4021 1 00:00:00 ? /sbin/resmgrd
bin Feb28 0.0 4022 1 00:00:00 ? /sbin/portmap
root Feb28 0.0 4448 1 00:00:00 ? [hwscand]
root Feb28 0.0 5227 1 00:00:00 ? /usr/sbin/sshd -o PidFile=/var/run/sshd.init.pid
root Feb28 0.0 5238 1 00:00:00 ? pure-ftpd (SERVER)
ldap Feb28 0.0 5245 1 00:00:00 ? /usr/lib/openldap/slapd -h ldap:/// -u ldap -g ldap
root Feb28 0.0 5466 1 00:00:00 ? /usr/sbin/powersaved -d -e /etc/powersave.conf -v 3
daemon Feb28 0.0 5635 1 00:00:00 ? /usr/sbin/slpd
lp Feb28 0.0 5805 1 00:00:00 ? /usr/sbin/cupsd
root Feb28 0.0 5834 1 00:01:45 ? /usr/sbin/nscd
root Feb28 0.0 5878 1 00:00:00 ? /opt/kde3/bin/kdm
root Feb28 0.0 5962 1 00:00:00 ? /usr/lib/postfix/master
postfix Feb28 0.0 6019 5962 00:00:00 ? qmgr -l -t fifo -u
root Feb28 0.0 6122 1 00:00:00 ? /usr/sbin/httpd2-prefork -f /etc/apache2/httpd.conf
wwwrun Feb28 0.0 6123 6122 00:00:00 ? /usr/sbin/httpd2-prefork -f /etc/apache2/httpd.conf
wwwrun Feb28 0.0 6124 6122 00:00:00 ? /usr/sbin/httpd2-prefork -f /etc/apache2/httpd.conf
wwwrun Feb28 0.0 6125 6122 00:00:00 ? /usr/sbin/httpd2-prefork -f /etc/apache2/httpd.conf
wwwrun Feb28 0.0 6126 6122 00:00:00 ? /usr/sbin/httpd2-prefork -f /etc/apache2/httpd.conf
wwwrun Feb28 0.0 6127 6122 00:00:00 ? /usr/sbin/httpd2-prefork -f /etc/apache2/httpd.conf
root Feb28 0.0 6168 1 00:00:00 ? /usr/sbin/cron
root Feb28 0.0 6170 1 00:00:00 ? /usr/sbin/xinetd
root Feb28 0.0 6239 1 00:00:00 tty1 /sbin/mingetty --noclear tty1
root Feb28 0.0 6240 1 00:00:00 tty2 /sbin/mingetty tty2
root Feb28 0.0 6241 1 00:00:00 tty3 /sbin/mingetty tty3
root Feb28 0.0 6242 1 00:00:00 tty4 /sbin/mingetty tty4
root Feb28 0.0 6243 1 00:00:00 tty5 /sbin/mingetty tty5
root Feb28 0.0 6244 1 00:00:00 tty6 /sbin/mingetty tty6
root Feb28 0.0 6245 1 00:04:21 ? /opt/IBM/db2/V8.1/bin/db2fmcd
root Feb28 0.0 6250 1 00:00:00 ? ./mflm_manager
db82as Feb28 0.0 6303 1 00:00:00 ? /home/db82as/das/adm/db2dasrrm
db82as Feb28 0.0 6333 1 00:00:00 ? /home/db82as/das/bin/db2fmd -i db82as -m /home/db82as/das/lib/libdb2dasgcf.so
wwwrun Feb28 0.0 13669 6122 00:00:00 ? /usr/sbin/httpd2-prefork -f /etc/apache2/httpd.conf
root Mar20 0.0 19429 5878 00:00:00 ? /usr/X11R6/bin/X -nolisten tcp -br vt7 -auth /var/lib/xdm/authdir/authfiles/A:0-jcbx9Z
root Mar20 0.0 19430 5878 00:00:00 ? -:0
root Mar20 0.0 19441 19430 00:00:00 ? /opt/kde3/bin/kdm_greet
hub Apr02 0.0 4108 4105 00:00:00 ? casmgr32 /rCICS-ORA /mfile:////home/hub/testing/work.xtc/REPOS/CICS-ORA
hub Apr02 0.0 4109 4105 00:00:00 ? casevmgr32 /rCICS-ORA
hub Apr02 0.0 4110 4105 00:00:00 ? casjcp32 /rCICS-ORA
hub Apr02 0.0 4111 4105 00:00:00 ? cassi32 /rCICS-ORA -rCICS-ORA
hub Apr02 0.0 12635 4105 00:00:00 ? cassi32 /rCICS-ORA
root Apr03 0.0 17244 3 00:00:02 ? [pdflush]
root Apr03 0.0 17253 3 00:00:00 ? [pdflush]
root Apr03 0.0 18985 6170 00:00:00 ? in.rlogind
root Apr03 0.0 18992 18985 00:00:00 ? login -- hub
hub Apr03 0.0 18993 18992 00:00:00 pts/1 -ksh
postfix 04:57 0.0 11433 5962 00:00:00 ? pickup -l -t fifo -u
root 05:29 0.0 12696 6170 00:00:00 ? in.rshd -aL

========================================================

I know that ps -e -o pcpu returns a percentage value - the CPU time divided by the time the process has been running and it will never add up to 100% but I would expect something useful.

top won't work on all platforms.

Does anyone have an ideas? Thanks in advance

ps -ef | cut -c42-80 | sort -nr | head -20 
  • will give you the last 20 processes that took much cpu time in order. The output should look like this :

Thanks for your command Sysgate it seems I get a lot of old processes that are hanging around such as

00:04:25 root     Feb28  0.0  6245     1 ?        /opt/IBM/db2/V8.1/bin/db2fmcd
00:03:55 root     Feb28  0.0    55     1 ?        [kswapd0]
00:01:46 root     Feb28  0.0  5834     1 ?        /usr/sbin/nscd
00:00:27 root     Feb28  0.0  2185     3 ?        [reiserfs/0]
00:00:12 root     Mar29  0.0 28598     1 ?        /home/hub/staff/ss/cb/bin/mfs2
00:00:12 hub      13:44  0.1 32049 30759 pts/0    top
00:00:04 root     Feb28  0.0     1     0 ?        init [5]
00:00:00 wwwrun   Feb28  0.0 13669  6122 ?        /usr/sbin/httpd2-prefork -f /etc/apache2/httpd.conf
00:00:00 wwwrun   Feb28  0.0  6127  6122 ?        /usr/sbin/httpd2-prefork -f /etc/apache2/httpd.conf
00:00:00 wwwrun   Feb28  0.0  6126  6122 ?        /usr/sbin/httpd2-prefork -f /etc/apache2/httpd.conf
00:00:00 wwwrun   Feb28  0.0  6125  6122 ?        /usr/sbin/httpd2-prefork -f /etc/apache2/httpd.conf
00:00:00 wwwrun   Feb28  0.0  6124  6122 ?        /usr/sbin/httpd2-prefork -f /etc/apache2/httpd.conf
00:00:00 wwwrun   Feb28  0.0  6123  6122 ?        /usr/sbin/httpd2-prefork -f /etc/apache2/httpd.conf
00:00:00 root     Mar29  0.0 28451     1 ?        /home/hub/staff/ss/cb/bin/ejl32
00:00:00 root     Mar20  0.0 19441 19430 ?        /opt/kde3/bin/kdm_greet
00:00:00 root     Mar20  0.0 19430  5878 ?        -:0
00:00:00 root     Mar20  0.0 19429  5878 ?        /usr/X11R6/bin/X -nolisten tcp -br vt7 -auth /var/lib/xdm/authdir/authfiles/A:0-jcbx9Z
00:00:00 root     Feb28  0.0  6250     1 ?        ./mflm_manager
00:00:00 root     Feb28  0.0  6244     1 tty6     /sbin/mingetty tty6
00:00:00 root     Feb28  0.0  6243     1 tty5     /sbin/mingetty tty5   

They don't seem to have been processes that would have kicked in and changed the cpu Idle of the vmstat report. Am I missing something?

First sort based on the CPU% in the reverse order, something like this:
ps -e -o user,stime,pcpu,pid,ppid,time,tty,args | sort -k3 -rn

Now couple of things (just throwing in my 2 cents):

  1. vmstat shows the summary of activity since the system was booted last. Well although the output changes with the kernel state changes, the old processes would still show up if they accounting to a higher CPU.
  2. Do you want to use mpstat instead of vmstat. Hmm not sure if that would really change the output much.

thanks for your reply Deal_NoDeal.

The problem I'm getting is that I get 0.0 returned for all my %CPUs. AARGH!
I decided to use vmstat not mpstat as wasn't that interested in by processor just overal.

When I use vmstat I don't take the first output for the reason you state. I get the latest val output after 2 seconds

#gets the column that contains the cpu idle val as diff col on diff sys
idColNo=`/usr/bin/vmstat 1 2 | grep id | awk -v idleVal=id -f cpuIdle.awk`
#gets val that corresponds to column
cpuIdleTime=`/usr/bin/vmstat 1 2 | tail -1 | awk -v idColNo=$idColNo '{ split($0, vals); print vals[idColNo]}'`

Do you have ideas?
Thanks for your time

I will try to find out more on this one. Nothing useful hits me now.