Hi All,
I have the following script which I use in Nagios to check the health of the applications, the problem with it is that the curl part ($TOTAL) does not return anything after running for 2-3 hrs, even though from command line the script runs fine but not from Nagios.
There are 17 invocations of this script in a second. Wonder if its due to the system not able to handle multiple curls at the same time. Though the load and usage on the system seems to be fine.
Can someone try to help and see if we can improve on this script's performance or is it something else.
Thanks,
Jack
#!/bin/bash
read URL < "$1"
TOTAL=`curl -w '\ntotal_time=%{time_total}s' -s -m 3 --connect-timeout 3 $URL |
perl -n0e '$s=/"alive"/?"OK":"ERROR";($t)=/(total_time.+)/;print "$s $t;0;0;0\n"'`
echo `date`: $TOTAL : $URL >> /tmp/curl.log
STATUS=`echo $TOTAL|awk '{print $1}'`
PERF=`echo $TOTAL|awk '{print $2}'`
case $STATUS in
OK)
echo "$STATUS|$PERF"
echo "curl $URL"
exit 0
;;
WARN)
echo "$STATUS|$PERF"
echo "curl $URL"
exit 1
;;
ERROR)
echo "$STATUS|$PERF"
echo "curl $URL"
exit 2
;;
FATAL)
echo "$STATUS|$PERF"
echo "curl $URL"
exit 2
;;
*)
echo "$STATUS|$PERF"
echo "curl $URL"
exit 2
;;
esac
Here is the curl.log output from when it was returning data and then does not return anything -
Wed Oct 6 15:11:41 PDT 2010: OK total_time=0.015s;0;0;0 : http://URL
Wed Oct 6 15:11:41 PDT 2010: OK total_time=0.021s;0;0;0 : http://URL
Wed Oct 6 15:11:41 PDT 2010: OK total_time=0.016s;0;0;0 : http://URL
Wed Oct 6 15:11:41 PDT 2010: OK total_time=0.017s;0;0;0 : http://URL
Wed Oct 6 15:11:41 PDT 2010: OK total_time=0.024s;0;0;0 : http://URL
Wed Oct 6 15:11:41 PDT 2010: OK total_time=0.017s;0;0;0 : http://URL
Wed Oct 6 15:11:42 PDT 2010: : http://URL
Wed Oct 6 15:11:42 PDT 2010: : http://URL
Wed Oct 6 15:11:42 PDT 2010: : http://URL
Wed Oct 6 15:11:42 PDT 2010: : http://URL