FileSystems under HACMP

LoLo92 · November 14, 2017, 5:00am

Dear Fellows,

I'm now working under a HACMP Cluster (version 7.1) with 2 nodes (Node1 active / Node2 passive), and 1 Resource Group on active node (Node1), which is UNMANAGED for boths nodes.
So, all VG Data are on Node1.
Then I had a JFS2 FileSystem full (located on one of these VG Data called "VG_Clust") that I had to increase it with LVM commands (chfs), not CSPOC ones (cl_chfs) and this worked fine.

This "VG_Clust" is in "read/write" permission, "Enhanced-Capable" and VG Mode "Concurrent" .

In case of Failover to Node2, should I need to "Verify and Synchronize HACMP" or perform Script Failure from Node1 ?

For so, this requires HACMP to be down ?

Could you confirm if these steps below is correct ?
smitty hacmp -> Custom Cluster Configuration -> Verify and Synchronize Cluster Configuration (Advanced)

Thanks for your kind reply.

rbatte1 · November 15, 2017, 5:02am

Assuming you have paid-for support from IBM (given that you are thinking of sending them a snap in this ticket) then that would probably be the best option, or just open a PMR and give them all the details from here.

I don't know if your cluster is in production/productive (non-prod but still important) use, but if it is then they will help you avoid risks of downtime and if something goes wrong then management will have a contractual place to blame rather than just yourself.

Sorry to shy away, but I no longer work with AIX clusters, so I can't explore for you.

Kind regards,
Robin

LoLo92 · November 17, 2017, 3:49am

Your exploration on AIX Clusters would be quite appreciated.
Thanks rbatte1

bakunin · November 22, 2017, 3:47pm

It seems you do not really understand how a HACMP cluster works, so a few words for clarification. Bear with me if this is already known. Also note that i will leave out a lot of details as i can't write a complete PowerHA documentation here.

OK, let us start with the central term in HACMP, which is "resource group". What is it?

Look at an application, say, a database: for it to run you first need some file systems, where the DB files are stored. Then you need some processes (the DB process(es) running - basically the application has to be started. Finally you need an IP address under which clients from outside can connect to the database and use its services.

Exactly these three components - file systems, (started) processes and an IP address is what a "resource group" consists of.

File systems: one or more volume groups go into a resource group. When a resource group goes active all these VGs are activated on one cluster node and all the file systems in it are mounted there. In case of a resource group move the FSs are unmounted, the VGs are deactivated on the node, then activated on another node and all the FSs mounted there. HACMP does this itself for all the VGs defined in a resource group.

Processes: for each resource group there is a so-called "application monitor", a collection of a start- and a stop-script. Whenever a resource group is deactivated the stop script is executed. It should make sure the application is down, so that afterwards the file systems can be unmounted. When a RG is activated its start script is executed and should start the application. These start-/stop-scripts are provided by you and are simple shell scripts so it is easy to integrate all sorts of applications into HACMP.

Finally the IP address: every RG can have one (or several, but typically one) "service addresses". These service addresses are normal IP addresses which are added to a certain network adapter when the RG starts and remove when it stops. Technically they are IP aliases which are added to network interfaces.

An RG-start now works like this: the VGs are acquired ( varyon ), the filesystems are mounted, then the start-script of the application monitor is executed. finally the service-IP-address is put onto a network interface and the clients can use it to connect to the application. If the RG is moved, the service-IP is taken down, the stop-part of the application monitor stops the application, FS are unmounted and VGs deactivated, then the start-procedure is done on another node. The client will notice that the service-IP is (after a short time) available again. If a node crashes the same as in an RG-move hapeens, only the stop-part is skipped (obviously). HACMP can handle that but you need to take care of the application part in your start-script eventually, like a cleanup in a DB in case of a sudden system shutdown, etc..

When your RG is in the state "UNMANAGED" it means that its FSs, processes, etc. are there, but not started via HACMP. Stop it using HACMP so that it becomes "OFFLINE" (meaning: not active on any node), then bring it online again. Now it should be in status "ONLINE" on a certain node. You can move it to another node from there.

A word about CSPOC: you should absolutely, positively use these commands, not the normal commands, to do LVM management. The reason is that for all the components i talked about above to work all the cluster nodes needs to share consistent information about how the parts of the resoufrce groups look like. The cluster commands do the same as the normal commands, but they distribute the changed information to the other nodes too. THIS IS VITAL!

You can get away with doing LVM operations if you do a "learning import" on the passive node afterwards, eventually a cluster synchronisation too. But why take such risks if there are commands to do exactly this without any risks involved at all?

I hope this helps.

bakunin

LoLo92 · November 23, 2017, 5:15am

Hello Bakunin,
Thanks for your kind reply.

Actually, in my low-budget Customer environnement, this very HACMP cluster is only configured and used when needed, that's why boths nodes are UNMANAGED for instance.
And what risks you mentioned above could happended when using LVM commands instead of CSPOC ones ?

Kind Regards

rbatte1 · November 23, 2017, 8:05am

So, do they both have access to a shared disk? The volume group is the smallest disk entity that can define to share between them, so you can't usually have one logical volume/filesystem accessed on NodeA with a different one in the same VG accessed on NodeB.

If you force the issue, you can have both servers accessing the shared disk at the same time, but as you can imagine, there will be conflicts because there is no locking between them. Imagine that NodeA reads a directory. NodeB then updates it. NodeA is not aware (because it will have cached it) and may make a different change that NodeB is then not aware of. There will very quickly be conflict over the free block list, file names, timestamps etc. It is possible that replaced files will be seen separately and then have random parts overwritten as time progresses. You will end up with a filesystem that is corrupted badly and will require 'fixing', but it is pot luck what gets salvaged and what is lost/damaged.

Can you describe what resources you have? Do you have a shared IP address that clients connect to and you can move to the 'Active' node?

If you want an Active-Active style cluster for load balancing you may be looking at Oracle RAC (with the associated costs) or maybe achieve this with more servers. The servers running the application that needs the data would NFS mount from an HA cluster set up to serve up the disk and they handle passing the volume group & IP address that the application servers connect to. The NFS mount on the applications servers will wait if the NFS server (appears to be singular) goes away and should recover when it (probably the other node) makes it available again.

Of course, there is then the performance cost of NFS if that is an issue to you.

A better description of your configuration and application needs might get a more useful response to help you.

Kind regards,
Robin

bakunin · November 26, 2017, 5:57pm

What do you mean by that? The whole point of a cluster is high-availability. If one of the nodes break the application still runs. If you know in advance when your node breaks you don't a cluster at all (although i don't believe such astute foretelling skills exist).

I don't understand this. "nodes" are the systems taking part in the cluster. They cannot be "unmanaged". They can only have their cluster services started ("joined the cluster") or not.

"Unmanaged" is a state only a resource group can be in.

I thought i described that in pretty detail: you have a cluster for the situations where something has (quite drastically) gone wrong. To make it possible that filesystems, volumes, etc. are taken over safely and started on the other node they share the information about how these FSes, LVs, etc. are built and in which state exactly they are right now. If you make changes to a LV (like increasing its size, etc.) and use normal LVM commands this information will not be propagated to the other nodes because these commands are not cluster-aware. If you use the respective CSPOC commands which indeed are cluster-aware they will do the same as the normal LVM commands but also use the clusters communication services (RSCT) to propagate this changed information to the other nodes immediately.

Again, you can get away with using "learning imports" on the other nodes to make the information consistent again, but why not just use the cluster commands, which do that automatically?

I hope this helps.

bakunin

LoLo92 · December 1, 2017, 5:08am

Hi All,

My Cluster state is STABLE but Resource Group on both nodes is UNMANAGED.
Sorry for this typing and terms errors.
And thanks a mil for your kind reminds and explanations.

Regards
Philippe

bakunin · December 2, 2017, 3:18am

This means that the cluster will not do anything even if the node where your application runs would crash. "UNMANAGED" means the RG is not running as part of the cluster at all.

To correct this stop the application, unmount its filesystems and varyoffvg its VGs. Then use either SMITty or the clmgr command to start the resource group on one of your nodes. This will mount the FSes again and start the appication, so from a users POV nothing has changed. From the clusters POV, though, the RG is now under observation by the cluster services and if the node fails somehow (crashes, loses network connectivity, whatever else) the RG will be taken over to the other node.

I hope this helps.

bakunin