Thoughts on HACMP: Automatic start of cluster services

zaxxon · July 26, 2013, 3:51am

Hi all,

I remember way back in some old environment, having the HA cluster services not being started automatically at startup, ie. no entry in /etc/inittab.
I remember reason was (taken a 2 node active/passive cluster), to avoid having a backup node being booted, so that it will not automatically be able to receive RGs from the active node, in case the backup node is being booted due to some error, failure, maintenance etc. and just then a failover from the active node could happen... Maybe data loss could occure, whatever as the node could be in an undefinable state (taking a paranoid view of things).
A boot of a system, that is not being issued by an administrator but by any other reason, needs to be investigated. Before that, I would not consider the node ready to be put back into the cluster. I think it is absolutely reasonable.

Though I often see environments and several good official and good unofficial documentations on the net, that the entry in the /etc/inittab to start hacmp automatically at boot is done automatically (I didn't remember that) and there is no note on precaution to disable it of the reasons described above.

How do you handle this? Do you leave the entry in there or remark it?
What's your reason for the one or other?
Is my approach too paranoid or unrealistic?

Please share your thoughts, thanks

bakunin · July 26, 2013, 6:09pm

I think a system - any system, not only HACMP-nodes - having gone through a power-cycle should not be started automatically, because there was surely a reason for having come down in first place. A simple power-cycle will most certainly not correcct that problem and the machine should stay down until an admin can verify the system to be OK and initiate the application startup.

If a system is important enough for a downtime (usually until the next morning) not to be feasible then a HACMP-system should replace the single system. If it is important enough that not even the downtime of a standby-node can be tolerated then you need an admin available 24/7. It can't be stressed often enough: unstoppable service costs money. To place a system somehwere and then blame it on the admin that the hardware/software turns out not to be running non-stop without any maintenance is idiotic (and nevertheless oftenly seen).

Admins, btw., are not without fault at all. Not, because they cannot make the impossible possible, but because they didn't object from the first minute a plan for such a system has been hatched.

Back to the original question: i think it is better to start the cluster manager manually and i always configure my systems to work that way.

bakunin

zaxxon · July 28, 2013, 5:08am

Hi Wolf, thanks a lot for sharing your point of view - good to have this confirmation

firefox111 · August 8, 2013, 10:18am

Hi Zaxxon,

somewhere in the HACMP Ressource Group Configuration you can define how the RG behaves when the original Node rejoins the cluster. You can configure it to not fail back if the original node rejoins the cluster.
That way you could automatically start HA Services and be able to takeover in case of a failure without risking an unwanted takeover when the node rejoins.

MichaelFelt · August 13, 2013, 3:55pm

Since HACMP 5.X the cluster manager gets started automatically - that is why it is in the inittab.

There are three kinds of resource groups and each have their own startup (node just coming up) and recovery policy.

So, to be to the point: HACMP v4 and earlier did not have HACMP in inittab as best practice - because the cluster manager was not always active. Starting with HACMP v5 the cluster manager is always active, so the daemons (not the resource groups) are always (meant to be) activated when the node/server starts.

Hope this helps.