PowerHA(HACMP) full vg loss - cluster hangs on release_vg_fs event

vilius · April 9, 2013, 8:00am

Hello,

AIX 6.1 TL7 SP6
POwerHA 6.1 SP10

I was experimenting with new hacmp build. It's 3-node cluster build on AIX 6.1 lpars. It contains Ethernet and diskhb networks. Shared vg disk is SAN disk. Two nodes see disk using vscsi, third node sees disk using npiv. Application is db2 server.

Most accidents usually involve some kind of network failure - so I decided to test my cluster against Ethernet failure and SAN failure. Ethernet failure test was successful - when node lost Ethernet connectivity(both cables of course) my resource group jumped to next node with no problem.

Next I did SAN failure test:
I did it in 2 different ways by removing vsci mapping in vios or by removing fcs mapping in vios(npiv case) - results were exactly the same in both cases - cluster reacted correctly and started release_vg_fs event, release_vg_fs script tried to unmount filesystems but since all fs disk devices were gone script just hung, and cluster started issuing config_too_long events..
So clstat reports resouce group as "RELEASING.." and that's it...

How do I configure PowerHA to handle full vg loss(for example SAN down causes that) correctly ??

thanks,
Vilius M.

MichaelFelt · April 10, 2013, 4:57pm

Well, you need to differentiate between between full SAN loss (no nodes can see disks) and failure of only one node seeing disk.

It has been years since I have debugged HACMP scripts - there have been a lot of additions to what is checked, but at it's core the problem is that a resource has gone done - not a topology element - so it is up to the application stop script to make sure the resources are released before "standard processing" continues.

To have this fully automated you would need to write a recovery script that HACMP could call - as config_too_long - means HACMP does not see this as an error.

What I would look for is using the application monitoring abilities to see that the application is down and doing a verification of the resources on the "active" node.

If I recall correctly, the steps PowerHa takes is:
1) application stop - key here is that there are no open files on file system so that following step(s) that
2) release the resources (i.e. 2a) unmount filesystems and 2b) varyoffvg volume group) can succeed.

Again, config_too_long means the script is not exiting with any status - so it is not an error. It is hanging. I would have to look at both the current script as well as application monitoring to determine if application monitoring could inject a new action by the cluster manager to forcibly unmount the filesystems. I am guessing that is not a possibility.

Comment: I would be nervous to be mixing vSCSI and NPIV within a resource group. No real issues with a mix in the cluster, but real concerns when mixing technologies for a single resource in a resource group.

Hope this helps you advance your testing! Good diligence!

vilius · April 11, 2013, 5:20am

Hi,

Thanks for reply.

I just want to clarify some details about my problem:

I'm talking about vg_loss(SAN_down/vscsi_down/npiv_down) only on single node - other nodes see SAN disks with no problem.
vscsi and npiv mixing is only for test purposes.
The problem is not error but hung release_vg_fs event script and to be specific it's "umount -f .." command which hangs - I see that using ps during the event. I tried removing all app processes using fuser in my app server script - it doesn't help - umount still hangs. The same problem is evident even without cluster, just using manual administration commands: all vg devices gone - umount will never return. My problem could be simplified as: how do I umount fs then it's vg and devices are gone?

Only solution I see now is node shutdown on some event(I did not decide which one yet), shutdown never finishes because of hung umounts but node releases resource group - and that is enough. If someone could suggest smarter solution please do.

On the other hand it's standard situation and people who test their cluster against "SAN down on single node" should face the same situation.

Vilius M.

MichaelFelt · April 11, 2013, 4:02pm

If it is doing unmount -f , and that is not completing - I would need a trace to see what is (not) happening.

I would open a PMR to get official support statement on if/when unmount -f, by design, may hang.

I am assuming that you have tried an additional unmount -f . Have you also tried a varyoffvg -f ?

What does lsvg say? I am assuming with the diskhb network you are also using enhanced concurrent volume groups . Which one says " active ?" - what do the systems that can see the disk say? What happens when SAN connectivity is restored?

(p.s. will be traveling for work soon, this will delay further comments)

vilius · April 25, 2013, 4:25pm

I called IBM support for this - after some back and forth info exchange they recommended AIX upgrade to TL8 SP2, so I did this.
After upgrade problem is gone - during full vg loss cluster umounts filesystems just fine.

This one is solved.