emergency shutdown best practices.

Has anyone implemented or have suggestions on how to shutdown many remote unix/linux servers from a single script initiated from 1 server?

I need this to execute in parallel as time is not on my side. Our ups is sadly underrated and will die in approximately 15 minutes. (There is not any money in the budget to upgrade the ups.)

If you can ssh around as root

for host in `cat hostlist`; do ssh $host shutdown <arguments>;done

(assuming that you put all of the hosts in a file named hostlist)

If you have more than 15 minutes to plan for "emergency shut down" of your servers, I'd recommend:

  1. All applications that are running have corresponding startup and shutdown scripts in rc.*
  2. Identify the order that your hosts should be shutdown in - NIS or LDAP should shutdown last, NFS servers second last... NTP would go first...
  3. Write scripts.
    You need to send a wall to all users connected informing them of an impending outage.
    You need to ensure that you send the right shutdown options to the right OS types.
    You need to create for each command that you are sending - for audit and CYA purposes later.
  4. Inform your business/clients/users that these are the "emergency shutdown" procedures. Get them to sign-off and buy into them. If they have special requirements, amend your policy to include those.

Ensure that you have adequate time to shutdown storage devices that may have a great deal of data in cache. Ensure that you have adequate time to shutdown tape storage systems, as the robotics may need more time to get to "home" than you might expect.

There is no shortage of things that you could do, but this should get you started.

Here in lies the problem. Our network security officer will not allow ssh as root. Also, we have many "flavors" of unix/linux which have different shutdown options. I tried something like this but it does not work on all the servers. (They do not like the <<\EOT...EOT construct)
$1 is list of remote servers. mbaker has sudo root privilege

cat $1 | while read X
do
echo "Starting shutdown of $X"
ssh -T ${X} << \EOT >> Emergency_shutdown.log 2>> error.log
name=`uname -n`
echo "name = $name"
type=`uname -a | awk '{print $1}'`
echo "type = $type"
if [ "$type" = "SunOS" ]
then
echo "Emergency shutdown initiated for $name."
# sudo -u root shutdown -y -i5 -g0 "Emergency shutdown started!!!!!" &
fi
if [ "$type" = "Linux" ]
then
echo "Emergency shutdown initiated for $name."
sudo -u root /sbin/shutdown -k now "This is just a test. Not really re-booting." < /dev/null > /dev/null 2>&1 &
fi
EOT
if [ $? -ne 0 ]
then
echo "Host $X connect failed."
fi
done
exit 0

the best way is to create an script per server. called something like
"emergency-shutdown.sh"
and call that one.
and put all the os specific commands on each server
its harder to maintin maybe, but is cleaner, and more flexible.

can you do this as user mbaker:

for host in `cat hostlist`; do ssh $host sudo -u root ifconfig -a;done

without being asked for a password? (In Solaris, only root can see MAC address - it's a harmless test).

Depending on how your systems are config'd, you may be able to sudo without providing a password. If you DO need a password to do the sudo, you could add some scripting magic to wait and apply the password, but it's not terribly secure...

Good point Broli - since you are already using sudo, create the shutdown scripts (one script for all hosts - perform the OS check locally), and give mbaker the right to run the shutdown script.
Then, your script would simply be:

for host in `cat $hostlist` do; ssh $host emergencyShutdown <flags/options>;done

Thanks for the help.

for host in `cat hostlist`; do ssh $host shutdown <arguments>;done

will work for me.

that will work if you have a unix server with a simple config, totally managed trough systemv scripts.

but, in reality, you have server with multiple services, servers containing multiple virtual servers.
some services need to be properly shutdown with some command, and some even need some time since you issue the stop command before you could actually bring down the Os itself.
that is why i pointed that instead of the shutdown command, use a script.
it should be named the same way, in the same path on all servers to allow a simple while in the "master" server.

and each script will be responsible for all the logic for the stop procedure of this weird services that cant simple be killed. the sleeps to ensure they have some time to end correctly, ect

i remember one place i used to work, they used a protocol over tcp/ip to transfer messages between servers.
you had one gateway, receiving msg, distributing them to the proper apps and databases, and replying to them.
this gateway was also listening to other gateways in other countries from the same company.
the thing is that you couldnt simple kill everything down.
you had to isse stop commands to all the backends, to stop answering requests, but dont kill the current ones, after some time (something like 10 mins)
in the meant time, you had to tell the gateway there was problems, so it had time to tell others gateways, so they could start answering the request sent to him.
after all the backends where stoped, , you had to stop the gateway.

and that is a simple example. i have seen way more complicated companies, where they had multiple machines working in line.
they neede a complete hour to shutdown the hole procesing line, without lossing data in between

You have really opened my eyes broli.

I was hoping that the scripts in the /etc/rc*.d files would bring down services,applications, etc...

I am so screwed because I have no clue what is happening on most of the servers. You see I am a dba and was tasked with this "side" project because I new unix shell scripting. I was given a server list and told to create an emergency shutdown script that will bring the servers down quickly, but not painfully like a recent power failure did. I still do not know why this was not assigned to syseng. (We are way under-staffed in syseng).

its a matter of how much pressure you can create.
how much you can force other depts in the company to give you the info, or how much you can sustain when they dont, and your script destroys their data

on a side note, try your best, and ask for money :stuck_out_tongue: