Restart a Service!!

Hello, I am trying to write a script which will monitor few processes(winbind) for cpu utilization, If the process consumes more than say 99% cpu for 3 minutes, I want to run a script to restart the service which forks the process.

---------- Post updated at 11:21 AM ---------- Previous update was at 11:12 AM ----------

The Logic I am trying to use is grep for winbind from ps output.

ps -eo pcpu,pid,user,args,cputime | grep winbind|grep -v grep
0.0 18339 root winbindd 00:00:00
0.0 18343 root winbindd 00:00:00
0.0 18344 root winbindd 00:00:00
0.0 18397 root winbindd 00:00:00

Now add the total of left most column and see if its greater than 99%. If its consistently above 99% for 3 or 5 minutes, then do `/etc/init.d/winbind restart`.

Sounds like you have all the pieces of pseudo code to write your script. What is the problem you're having?

I redirected the left most column to a file

ps -eo pcpu,pid,user,args,cputime | grep winbind|grep -v grep|awk '{print $1}' > /tmp/file1

Calculated the Avg.

echo `echo $(sed -e 's/$/+/' /tmp/file1) 0|bc`/4|bc

The hurdle is how to determine all processes were consuming >99% of cpu for a 5 minute period?

Probably the easiest would be to take a snapshot of the processes periodically over the 5 min period and use the accumulated numbers for you calculation. You might also look at experimenting with sar -X|x.

I looked at sar initially, but it wont provide process level granularity.

Could you please explain a bit on how to achieve thru snapshots?

Depending on what version of sar you have, the -x|X switches are pid specific.

For a process monitor, I'd probably want it to run all the time and then periodically grab the ps output to determine if the process is under|over your thresholds. One large loop with a sleep at the end. Every iteration through the loop would take another snapshot of the state of your processes and make comparisons. You could also store the results of the previous few snapshots calculating averages.

Since you're going through all the work, you could make it more generic than just your specific process (winbind) and pass variables or have a configuration file to monitor any process you like for whatever attribute you like. Have a look at chapter 31 in Expert Shell Scripting for more.