AIX Hardware Migration w/ HACMP...Advice Needed

uzair_rock · April 22, 2016, 3:58pm

Hello Everyone,

Hope you all are doing great!

As you can see by the title on top, we are in the process of migrating alot of our servers from Power5 (physical) to Power8 (Virtual). Now it's turn for servers with HACMP Cluster on it. Let me lay out the environment like:

OLD ENVIRONMENT:

The Primary and secondary nodes resides on a Power5 host. Both are physical servers. The rootvg is on internal disks and the data vg's are on SAN Attached storage. Both primary and secondary nodes are on AIX 7.1 TL3 SP4. For HACMP we have a Active-Passive configuration.

NEW ENVIRONMENT:

Everything is virtualized. We have a Dual VIO setup on the Power8s. The boot disk are coming through a Storage cluster on VIO (vSCSI) and Data disks from SAN (NPIV).
As part of the new environment we are also moving on to the new network i.e. different IP subnet. (old=172.XXX) (new=10.XXX)

The way we have been migrating our previous (non-HA/test/dev) servers is that we clone the mksysb on to the vscsi disk on the new lpar before the day of the cutover and configure the new network on the server. On the day of cutover we bring down apps/DB on the old server (p5) then umount FS, varyoff and export VGs and remove the SAN disks. The storage team unmaps the LUNs and maps them back on the NEW WWPNs on the Power8 (NPIV).

On the new server we configure and import the disks and VGs and mount all FS inclulding NFS. Make DNS changes with the new IP and bring down all apps and DBs.

Now once we bring HACMP into the picture things become a little more complicated, can some of the experts here give me sound advice on how can we make the process as smooth as possible as we are moving on to the new server hardware as well as new network.

Thanks in Advance!

agent.kgb · April 22, 2016, 5:08pm

make an HACMP snapshot before migration/mksysb.
I personally would do server and IP migration separately. Either first IP migration and cluster reconfiguration and then server migration or vice versa, but not all at the same time.
Many changes at once - too many points of failures and too much troubleshooting in case of a problem.

uzair_rock · April 22, 2016, 8:13pm

I thought about that, but here is the issue;

Both networks have restrictions. I cannot put old IP (172.X) on the new hardware as it connect to the new network and vice versa, that is what makes it so complicated.

bakunin · April 25, 2016, 2:29pm

In this case you will have to have a downtime of some sorts. Alas, there seems to be no way around that because you need to (newly) configure HACMP on your new system and this means some downtime when you make the transition. Fortunately this downtime can be minimised pretty much by good planning.

Furthermore you haven't said anything about involved OS and HACMP versions. I suppose you will have to update at least one of them (most probably both) too.

I'd investigate the following procedure (you might have to add some things, this is just a first idea):

Take an mksyb from the running systems, create the appropriate LPARs on the new system and try to install it from the mksysb image without any data (just the rootvg).
Now do all the updates (start with OS, then the HACMP software) on the new systems
get a single LUN from your SAN and configure your new cluster with the same topology as the old one, just with one test-VG on the single LUN. Use this to test your zoning, (basic) cluster setup and other details.
Finally, after preparing your new cluster with new addresses on the new hardware you need the downtime: get the SAN people to zone the old data-LUNs to your new hardware so that the disks are seen from your cluster nodes, create the new cluster config for the imported VGs and start. You might want to add the DNS names of the old cluster service IPs as aliases to the new addresses so that the transition becomes smoother for the clients.

I hope this helps.

bakunin

uzair_rock · April 25, 2016, 3:51pm

bakunin:

In this case you will have to have a downtime of some sorts. Alas, there seems to be no way around that because you need to (newly) configure HACMP on your new system and this means some downtime when you make the transition. Fortunately this downtime can be minimised pretty much by good planning.

Furthermore you haven't said anything about involved OS and HACMP versions. I suppose you will have to update at least one of them (most probably both) too.

I'd investigate the following procedure (you might have to add some things, this is just a first idea):

Take an mksyb from the running systems, create the appropriate LPARs on the new system and try to install it from the mksysb image without any data (just the rootvg).

Now do all the updates (start with OS, then the HACMP software) on the new systems

get a single LUN from your SAN and configure your new cluster with the same topology as the old one, just with one test-VG on the single LUN. Use this to test your zoning, (basic) cluster setup and other details.

Finally, after preparing your new cluster with new addresses on the new hardware you need the downtime: get the SAN people to zone the old data-LUNs to your new hardware so that the disks are seen from your cluster nodes, create the new cluster config for the imported VGs and start. You might want to add the DNS names of the old cluster service IPs as aliases to the new addresses so that the transition becomes smoother for the clients.

I hope this helps.

bakunin

Thanks for the reply!!

AIX version running is 7.1 TL3 SP4 and HA version is 6.1. Both are supported on the Power8s.
As far as the outage goes, we do have a 2.5 hour window to re-configure the cluster.
I have already included the mksysb and restore part into my plan. Doing my research I found this on the IBM website:

To change a service IP label/address definition:

Stop cluster services on all nodes.
On any cluster node, enter smit hacmp
Select HACMP Initialization and Standard Configuration > Configure Resources to Make Highly Available > Configure Service IP Labels/Addresses > Change/Show a Service IP Label/Address. Note: In the Extended Cluster Configuration flow, the SMIT path is HACMP > Extended Configuration > HACMP Extended Resources Configuration > Configure Service IP Labels/Addresses > Change/Show a Service IP Label/Address.
In the IP Label/Address to Change panel, select the IP Label/Address you want to change. The Change/Show a Service IP Label/Address panel appears.
Make changes in the field values as needed.
Press Enter after filling in all required fields. HACMP now checks the validity of the new configuration. You may receive warnings if a node cannot be reached, or if network interfaces are found to not actually be on the same physical network.
On the local node, verify and synchronize the cluster. Return to the HACMP Standard or Extended Configuration SMIT panel and select the Verification and Synchronization option.
Restart Cluster Services.

Do you think, if after I move everything on the new Power8 (OS and SAN). I can change the service and bootip addresses using the above method and then try to start the cluster? Do you think it'll work?

bakunin · April 25, 2016, 4:39pm

Current version for AIX 7.1 is TL3 SP6 and there is already 7.2 out there. I'd suggest at least the former because it fixes some problems with HACMP: on 7.1.3.4 one of the rsct-daemons is running wild and cluttering up the /var . You have either to install efixes (generally generating as many problems as it is solving) or update to the latest level.

Current version for HACMP is 7.1.3 and 6.1 is already EOL by September. Do yourself a favour and use this maintenance window to update to the as latest version as possible. DO NOT use any version below 7.1.3 if updating to 7.1! Many things (like the repo disks via NPIV with non-IBM storage) worked only in theory, not in practice. 7.1.3 is more or less stable. I have some 40-50 clusters running here and could go on for pages and pages about the workarounds and quick-fixes we had to use to get working clusters with the earlier versions of 7.1.

You do not need to: as i said, create your LPARs when your old cluster is still working, from teh mksysbs (plus necessary updates, see above), create a NEW 7.1-cluster and test that until you are ready to make the move. You can pre-create the complete cluster-configuration into a series of commands now, because FINALLY the clmgr -command really works and it is possible to do a cluster-config via commandline! This (not having to navigate all these SMITTY-menus all the time) is by far the biggest relief since i work with HACMP.

You said you needed to use new IP-addresses anyway, so don't bother. Create your new cluster with the new addresses and test thoroughly, then make the transition basically by moving the data disks (they are NPIV, no?) to the new LPARs.

It might work, but again: you don't need that. I can give you a complete procedure for setting up a (7.1-)-cluster and in fact it is 10 minutes work now, only a few commands. Far better and far easier than to navigate these endless SMIT-menus.

I hope this helps.

bakunin

/PS: don't get me wrong: SMITty is fine if you don't know exactly what you want to do and what format a certain command is. But for what i do daily and know exactly what and how to do SMITty is more a hindrance than a tool.

uzair_rock · April 25, 2016, 5:36pm

bakunin:

Current version for AIX 7.1 is TL3 SP6 and there is already 7.2 out there. I'd suggest at least the former because it fixes some problems with HACMP: on 7.1.3.4 one of the rsct-daemons is running wild and cluttering up the /var . You have either to install efixes (generally generating as many problems as it is solving) or update to the latest level.

Current version for HACMP is 7.1.3 and 6.1 is already EOL by September. Do yourself a favour and use this maintenance window to update to the as latest version as possible. DO NOT use any version below 7.1.3 if updating to 7.1! Many things (like the repo disks via NPIV with non-IBM storage) worked only in theory, not in practice. 7.1.3 is more or less stable. I have some 40-50 clusters running here and could go on for pages and pages about the workarounds and quick-fixes we had to use to get working clusters with the earlier versions of 7.1.

You do not need to: as i said, create your LPARs when your old cluster is still working, from teh mksysbs (plus necessary updates, see above), create a NEW 7.1-cluster and test that until you are ready to make the move. You can pre-create the complete cluster-configuration into a series of commands now, because FINALLY the clmgr -command really works and it is possible to do a cluster-config via commandline! This (not having to navigate all these SMITTY-menus all the time) is by far the biggest relief since i work with HACMP.

You said you needed to use new IP-addresses anyway, so don't bother. Create your new cluster with the new addresses and test thoroughly, then make the transition basically by moving the data disks (they are NPIV, no?) to the new LPARs.

It might work, but again: you don't need that. I can give you a complete procedure for setting up a (7.1-)-cluster and in fact it is 10 minutes work now, only a few commands. Far better and far easier than to navigate these endless SMIT-menus.

I hope this helps.

bakunin

/PS: don't get me wrong: SMITty is fine if you don't know exactly what you want to do and what format a certain command is. But for what i do daily and know exactly what and how to do SMITty is more a hindrance than a tool.

You're definitely a life saver! Updating to HA 7.1.3 makes sense. We have been using smitty which definitely take more than 10 mins lol.

If it's possible can you give me a complete procedure for setting up a (7.1-)-cluster that will help me a lot.

bakunin · April 25, 2016, 6:48pm

I can do that, but it will have to wait until i am back in the office to consult my documentation. i need my beauty-sleep now. ;-))

bakunin

uzair_rock · April 25, 2016, 6:53pm

Absolutely! Take your time and thanks once again!

bakunin · April 26, 2016, 8:42am

This is how we do it: we configure 2-node active/passive clusters ("rotating") with usually one service address and one or two RGs (i.e. DB, application). Each node has only 1 network adapter ("en0"):

Create two NPIV-disks 1GB each and zoned to both nodes for the CAA-repositories (you need these instead of the heartbeat-disks).
add the IPs and DNS-names for all cluster-nodes and all service-addresses to /etc/hosts and /etc/cluster/rhosts :

# cat /etc/hosts
10.1.1.1    node1
10.1.1.2    node2
10.1.1.3    service1

# cat /etc/cluster/rhosts
10.1.1.1
10.1.1.2
10.1.1.3

switch off dead-gateway detection if applicable:

echo "!REQD en0 <default-gw>" > /usr/es/sbin/cluster/netmon.cf

set PVid for Repo- and Spare-Disks:

root@node1 # chdev -a pv=yes -l hdisk<XX>
root@node1 # chdev -a pv=yes -l hdisk<YY>

root@node2 # cfgmgr

start cluster-services:

root@node1/2 # startsrc -s clcomd
root@node1/2 # startsrc -s clinfoES

create cluster itself (if that doesn't work restart the clcomd on both nodes like described below):

root@node1 # clmgr add cluster <CLUSTERNAME> \
             REPOSITORY=<hdiskXX> \
             NODES=<node1>,<node2>

root@node1/2 # stopsrc -s clcomd; sleep 2 ; startsrc -s clcomd

First-alias distribution policy (needed for NIM to work on active cluster-node). You can do this only after defining the first cluster-network (see below)

root@node1 # clmgr mod network net_ether_01 RESOURCE_DIST_PREF=NOALI

discover shared discs/VGs:

root@node1 # clmgr disco vg

----- So far the cluster itself, now the resource groups. Repeat the following for every RG:

Service-IP-Label, network:

root@node1 # clmgr add service_ip <service-IP> NETWORK=net_ether_01

create application controller (former "start-/stop-scripts"):

root@node1 # clmgr add application_controller <APP_CONTROLLER_NAME> \
                      STARTSCRIPT=</path/to/startscript> \
                      STOPSCRIPT=</path/to/stopscript>

create resource group (the following is commented for your information, but you need to remove the comments to make the command work):

root@node1 # clmgr add rg <RG-name> \
                   NODES=<node1>,<node2>[,...] \    # nodes to serve the RG
                   STARTUP=OHN \                    # online on home node only
                   FALLOVER=FNPN \                  # fallover to the node of next priority
                   FALLBACK=NFB \                   # never fall back ("anti RG-ping-pong")
                   SERVICE_LABEL=<serviceIP> \
                   VOLUME_GROUP=<vg1>[,<vg2>,...] \
                   FORCED_VARYON=true \
                   FS_BEFORE_IPADDR=true \
                   APPLICATIONS=<APP_CONTROLLER_NAME>

finally sync cluster (just for the nerves )

root@node1 # clmgr sy cl fix=yes

start cluster:

clmgr on node <node1>[,<node2>,...]

I hope this helps.

bakunin

uzair_rock · April 26, 2016, 12:54pm

It surely will help without a doubt! Thanks a ton!

One last question, after testing the cluster and on the day of migrating data disks from the old server. How should I add those disks/VGs on the resource group and re-configure the cluster with the imported npiv disks. can you please explain?

bakunin · April 26, 2016, 2:28pm

You can configure your cluster with a single disk (visible on all cluster-nodes) as a stand-in. WIth this you test all the IP-adresses, start-/stop-scripts, etc.. You said you want to use new IP-ranges anyways, so you can test with your new production-IPs safely while the old cluster is still running.

On the day of the transition you

zone the datadisks of your old cluster to the new cluster-nodes (you need SAN support for this)
throw away your (test-)VG and the RG residing on it.
run cfgmgr to see the newly discovered discs, import the VGs on all nodes (be sure to end with concurrent-capable Vgs), run a clmgr disco vg ("discover VGs") on one node to make the VGs available in HACMP and run the RG-definition from above ( clmgr add rg ... ) again. It is that simple.

Do yourself a favour: when importing the VGs on all nodes make sure the same VG gets the same major number on all nodes. It will work without that too, but it makes keeping track easier that way. I even try to have the same hdisk-numbers on all nodes, so that hdiskXX on node1 is also hdiskXX on node2, etc..

I hope that helps.

bakunin

uzair_rock · May 2, 2016, 6:52pm

Thanks for the reply Bakunin.

Sorry for the delayed response, was seriously busy at work. I will be testing it out this week and will come back with the response.

Thanks once again!