I need help in finding a script to monitor loads for 8+ servers on a single console. The goal here is to centralize it and run the script from a single server. Can anyone help on this?
Im running this script on each server to monitor the load.
while true; do w | grep average | grep -v grep ; sleep 4; done
Radar
January 16, 2007, 9:35am
2
locabuilt:
I need help in finding a script to monitor loads for 8+ servers on a single console. The goal here is to centralize it and run the script from a single server. Can anyone help on this?
Im running this script on each server to monitor the load.
while true; do w | grep average | grep -v grep ; sleep 4; done
Nagios will do what you're asking. BTW, do you really need to grep out the grep?
otherwise I have something in mind, but it's too complicated and time consuming to be done, it includes tcl/expect and STAF.
system
January 17, 2007, 4:51pm
4
No need to use "w | grep average | grep -v grep" uptime gives you what you want
# uptime
8:48am up 144 day(s), 8 min(s), 5 users, load average: 0.23, 0.19, 0.18
#
You might want to look at Big Brother aswell, not as complicated as nagios to setup
system
January 17, 2007, 5:01pm
5
Or for a bit more info try prstat
# prstat 1 1 | grep average
Total: 96 processes, 194 lwps, load averages: 0.15, 0.16, 0.17
#
OR For even more info instead of just the average line....
# prstat -ac 1 1
PID USERNAME SIZE RSS STATE PRI NICE TIME CPU PROCESS/NLWP
4424 named 60M 59M sleep 59 0 16:36.23 4.6% named/7
8331 root 6096K 3264K sleep 59 0 0:11.21 0.5% syslogd/21
388 root 73M 72M sleep 59 0 10:14.11 0.1% rpc.nisd/4
378 root 2984K 2056K sleep 59 0 1:07.15 0.1% rpcbind/1
813 root 8216K 7632K sleep 59 0 0:00.00 0.0% mibiisa/12
3179 root 1256K 912K cpu3 49 0 0:00.00 0.0% prstat/1
642 root 2240K 1280K sleep 100 - 0:31.21 0.0% xntpd/1
381 root 2976K 1376K sleep 59 0 0:00.00 0.0% keyserv/7
428 root 4928K 3384K sleep 59 0 0:00.46 0.0% automountd/5
556 root 1760K 896K sleep 59 0 0:19.45 0.0% prngd/1
656 root 2536K 1600K sleep 59 0 0:14.06 0.0% nmbd/1
419 root 1928K 1304K sleep 59 0 0:00.00 0.0% lockd/1
427 daemon 3392K 2256K sleep 59 0 0:00.00 0.0% statd/5
1612 sn00 1528K 1104K sleep 59 0 0:00.00 0.0% csh/1
390 root 2480K 1816K sleep 59 0 0:00.00 0.0% rpc.nispasswdd/1
NPROC USERNAME SIZE RSS MEMORY TIME CPU
1 named 60M 59M 3.0% 16:36.23 4.6%
64 root 277M 193M 9.8% 13:09.00 0.8%
7 www 20M 13M 0.7% 0:00.00 0.0%
1 daemon 3392K 2256K 0.1% 0:00.00 0.0%
14 sn00 38M 20M 1.0% 0:00.34 0.0%
Total: 95 processes, 193 lwps, load averages: 0.09, 0.12, 0.15
#
uptime will be more benifit , with script which collects all the server loads
#!/bin/sh
loadcnt=$(uptime | awk -F "." '{ print $1 }' | awk -F ":" '{ print $5 }') echo " the current load:"$loadcnt if [ $loadcnt -gt 0 ]; then echo "Alert System Process handling exceeded"
fi;
u can individually execute this script on differetn script using the SSH command
ssh 192.168.23.22 "sh /bin/alertscript.sh" > monitorfile
i think it should be $4
loadcnt=$(uptime | awk -F "." '{ print $1 }' | awk -F ":" '{ print $4 }')
[root@localhost ~]# uptime | awk -F "." '{ print $1 }' | awk -F ":" '{ print $4 }'
0
[root@localhost ~]# uptime
11:50:17 up 16 days, 24 min, 4 users, load average: 0.00, 0.00, 0.00
Than you all for the help. I will try the scripts and see if I can somewhat centralize this. I will let you guys know how it goes.