Configuring new disks on AIX cluster

dnicky · December 10, 2008, 2:33pm

We run two p5 nodes running AIX 5L in a cluster mode (HACMP), both the nodes share external disk arrays. Only the primary node can access the shared disks at a given point of time.

We are in the process of adding two new disks to the disk arrays so as to make them available to the existing volume group (non rootvg) present on the shared disks. On the primary node following steps were executed to make the disks available,

Power down the node and disk arrays.
Plug in the new disks.
Power on the nodes and disk arrays.
Run cfgmgr to configure the new disks.

After the above steps, can see two new disks available on the primary node, which were then added to the existing volume group (non rootvg). A new logical volume was then created using the two new disks.

Will appreciate if anyone can confirm, if the following steps are sufficient to replicate the new disks configuration (and new logical volume configuration)on the secondary server,

varyoff the (non rootvg) volume group on the primary node.
Ensure the (non rootvg) volume group on the shared disks is varied off.
exportvg the volume group on secondary node
importvg the volume group on secondary node.

Not quite sure whether would need to delete the existing disk devices from the ODM (using rmdev) and then run cfgmgr on the secondary server before running importvg or whether running exportvg followed by importvg only is sufficient?

shockneck · December 10, 2008, 6:55pm

With HACMP 5.1/5.2/5.3/5.4 the procedure runs about like this:

First: Make disks visible from all cluster nodes by running cfgmgr. The new disk devices "none None" should be visible then on every node. No need to keep device number identical though.
Second: Create PVID on every disk from every node before you start to introduce them to the cluster. This can be done by a "chdev -l hdiskX -a pv=yes". Only the first time a pvid is written onto the disks. On the second node this command detects the pvid and uses it. I.e. you end up with new disks that can be identified by their pvid from every cluster node.
Third: let HACMP (from the HACMP SMIT panels) discover the cluster node's new hardware.
Fourth: use CSPOC to add a disk to an existing shared vg and then to extend LV and FS.
Can be done while HACMP (and the RG) is online.

Now to what you did....that could be done probably but seems to be too complicated to me. The AIX way of administration is somewhat different. Even with a SCSI2 protocal based VG there is neither a need to varyoff the VG from the primary node nor export the VG from the secondary node. You could do a Learning Import online. On primary you do
# varyonvg -b -u <sharedvg>
on secondary
# importvg -L <sharedvg>
and immediately afterwards on primary
# varyonvg <sharedvg>
This procedure was used in HACMP 4.x times. Mind that the Learning Import would be wrong for ECM VG. It also won't work after an exportvg as it expects the VG being known in the server already.
Obviously you stopped the cluster. That way there should at least be no problem with verifying and synchronising the cluster. Test it. After going live don't change anything.

dnicky · December 11, 2008, 6:06am

Many thanks for your comments.

For my understanding, as I'm not an expert with HACMP, what will happen if the two new disks are not configured i.e. PVIDs are not created using chdev command on the secondary node before starting the cluster services on the primary node?
I guess, obviously the fail over to the secondary node will fail as the cluster services won't be able to access the new disks. In this scenario, will it be ok to create the PVIDs for the new disks on the secondary node and then subsequently fail over to the secondary node?

shockneck · December 12, 2008, 2:33am

You are right both with your assumption about what whould happen and that you need to make the PVIDs known to secondary before you could proceed. I don't know which HACMP version you use. Regardless of the version the following procedure should work: make the PVID visible, then update the VG information on secondary while HACMP is stopped (by a Learning Import), then verify and sychcronise the Cluster.

If you are curious you could try to verify/sync the cluster with the autocorrection flag activated after assigning the PVID but before the manual import to find out if there was a chance that the cluster would resolve the problem on its own. I did not try this with current HACMP version but saw some enhancements in 5.4.1 compared to earlier versions that make me think that it might be possible now. When you are below version 5.3 forget it.