load monitor script

locabuilt · January 15, 2007, 11:45am

I need help in finding a script to monitor loads for 8+ servers on a single console. The goal here is to centralize it and run the script from a single server. Can anyone help on this?

Im running this script on each server to monitor the load.
while true; do w | grep average | grep -v grep ; sleep 4; done

Radar · January 16, 2007, 9:35am

Nagios will do what you're asking. BTW, do you really need to grep out the grep?

sysgate · January 17, 2007, 7:43am

otherwise I have something in mind, but it's too complicated and time consuming to be done, it includes tcl/expect and STAF.

system · January 17, 2007, 4:51pm

No need to use "w | grep average | grep -v grep" uptime gives you what you want

# uptime
  8:48am  up 144 day(s), 8 min(s),  5 users,  load average: 0.23, 0.19, 0.18
#

You might want to look at Big Brother aswell, not as complicated as nagios to setup

system · January 17, 2007, 5:01pm

Or for a bit more info try prstat

# prstat 1 1 | grep average
Total: 96 processes, 194 lwps, load averages: 0.15, 0.16, 0.17
#

OR For even more info instead of just the average line....

# prstat -ac 1 1
   PID USERNAME  SIZE   RSS STATE  PRI NICE      TIME  CPU PROCESS/NLWP
  4424 named      60M   59M sleep   59    0  16:36.23 4.6% named/7
  8331 root     6096K 3264K sleep   59    0   0:11.21 0.5% syslogd/21
   388 root       73M   72M sleep   59    0  10:14.11 0.1% rpc.nisd/4
   378 root     2984K 2056K sleep   59    0   1:07.15 0.1% rpcbind/1
   813 root     8216K 7632K sleep   59    0   0:00.00 0.0% mibiisa/12
  3179 root     1256K  912K cpu3    49    0   0:00.00 0.0% prstat/1
   642 root     2240K 1280K sleep  100    -   0:31.21 0.0% xntpd/1
   381 root     2976K 1376K sleep   59    0   0:00.00 0.0% keyserv/7
   428 root     4928K 3384K sleep   59    0   0:00.46 0.0% automountd/5
   556 root     1760K  896K sleep   59    0   0:19.45 0.0% prngd/1
   656 root     2536K 1600K sleep   59    0   0:14.06 0.0% nmbd/1
   419 root     1928K 1304K sleep   59    0   0:00.00 0.0% lockd/1
   427 daemon   3392K 2256K sleep   59    0   0:00.00 0.0% statd/5
  1612 sn00     1528K 1104K sleep   59    0   0:00.00 0.0% csh/1
   390 root     2480K 1816K sleep   59    0   0:00.00 0.0% rpc.nispasswdd/1
 NPROC USERNAME  SIZE   RSS MEMORY      TIME  CPU
     1 named      60M   59M   3.0%  16:36.23 4.6%
    64 root      277M  193M   9.8%  13:09.00 0.8%
     7 www        20M   13M   0.7%   0:00.00 0.0%
     1 daemon   3392K 2256K   0.1%   0:00.00 0.0%
    14 sn00       38M   20M   1.0%   0:00.34 0.0%
Total: 95 processes, 193 lwps, load averages: 0.09, 0.12, 0.15
#

maheshwin · January 18, 2007, 1:21am

uptime will be more benifit , with script which collects all the server loads

#!/bin/sh

loadcnt=$(uptime | awk -F "." '{ print $1 }' | awk -F ":" '{ print $5 }') echo " the current load:"$loadcnt if [ $loadcnt -gt 0 ]; then echo "Alert System Process handling exceeded"
fi;

u can individually execute this script on differetn script using the SSH command

ssh 192.168.23.22 "sh /bin/alertscript.sh" > monitorfile

Krrishv · January 19, 2007, 1:23am

i think it should be $4

loadcnt=$(uptime | awk -F "." '{ print $1 }' | awk -F ":" '{ print $4 }')

[root@localhost ~]# uptime | awk -F "." '{ print $1 }' | awk -F ":" '{ print $4 }'
0
[root@localhost ~]# uptime
11:50:17 up 16 days, 24 min, 4 users, load average: 0.00, 0.00, 0.00

locabuilt · January 19, 2007, 1:37pm

Than you all for the help. I will try the scripts and see if I can somewhat centralize this. I will let you guys know how it goes.