Cluster node not starting

Setting up HACMP 6.1 on a two node cluster. The other node works fine and can start properly on STABLE state (VGs varied, FS mounted, Service IP aliased). However, the other node is always stuck on ST_JOINING state. Its taking forever and you can't stop the cluster as well or recover from script failure. I can't see any error from hacmp.out.

Here's the latest error I see from clstrmgr.debug (This is from console so I just type it here:)

getPriorityOverride: Returning 0 for the nodehandle:2
getPriorityOverrideSecondary: Returning 0 for the nodehandle:2
rm_CreateAllPolMsg: node NODE02 has group RG1 in node 1
getPriorityOverride: Reutrning 0 for the nodehandle:4
getPriorityOverrideSecondary: Returning 0 for the nodehandle:4
Before Sending: Message Length is 4232   NumResStates:2NumPols:2  numSSitePols:0  join_data_valid:0
rm_ProcessnPhaseCb: Voting to CONTINUE my join w/msg.seq_no1 packet_count:

---------- Post updated at 03:52 AM ---------- Previous update was at 03:47 AM ----------

Also one thing to add is that, I can start the cluster on any node as long as I have not started any other node. Meaning I can start the cluster and RG on either node1 and node2 but If I start it on node1, node2 won't bring up by clstart and shows as ST_JOINING forever. Thus, I cannot do a failover to other node unless the other node is in stable state.

Not sure about the reason without further information, but to me this looks like a communication problem. Check all the cluster networks (see "cllsif") for connectivity and mak sure the disk-heartbeat works as expected.

Another possible reason which comes to mind is the VG: make sure it is varied on in "enhanced concurrent" mode. Maybe there are disk reservations left over somehow: issue a "varyonvg -b -u" to break disk reservations.

I hope this helps.

bakunin

It feels like the synchronization was performed - "forced". As Bakunin mentions, it is most likely a communication problem.

Basic test - start all nodes but do not start a resource groups. I suspect you will not be able to start all nodes.