HACMP resource group State not STABLE

Hi,

Not sure if this is the correct forum to post this on but maybe a mod could move it if not.

When trying to move a HACMP resource group between lpars on AIX I receive the following.

State not STABLE/RP_RUNNING or ibcasts Join for node 2 rejected,
Clearing in join protocol flag
Attempting to recover resource group from error
"Resource group not found in client configuration"
echo '+BrokerMB02rg:clvaryonvg prmb02vg[808]' LC_ALL=C 0516-052 varyonvg: Volume group cannot be varied
on without a quorum. More physical volumes in the group must be active. Run diagnostics on inactive PVs.

When these errors are received the resource group then starts back up on the original node.

I have checked the LV's and Quorum is disabled and there are nbo stale PV's

Any help or pointers in the right direction would be much aprreciated.

Thanks,

Matt

Hi,

I would try to export and reimport the volumegroup in question on the inactive node. Make sure you keep the correct VG Major number (you can import using the -V flag).
If it finds all PVs you should just sync the cluster config and than try again. If it has problems during the import you can take it from there. I had similar issues after migrating storage across disks - duplicate pvid's on some disks - the cluster did not like that much.

Hope that helps,
regards
zxmaus

Thanks for the response, so If I carry out the following you will have to bare with me as I've not been using HACMP for long.

  1. Disbale moniotoring on resource group then shutdown apps using filesystems.
  2. exportfs -u /dirname
  3. exportfs /dirname
    When you say sync cluster config is this something that can be done via a command or that will hapen if the re-import is successful?

Thanks

Matt

Hello
on the inactive node nothing should be mounted from that volumegroup - so all you need to do is

  1. lspv - copy the output to somewhere so you know which disks belong to the volumegroup
  2. exportvg yourvolumegroupname
  3. importvg -V volumegroup major number -Ry volumegroupname PVID from any of the disks belonging to this volumegroup (you can look up the major number on the active node doing ls -ali /dev | grep volumegroupname - the major and minor number is stated beside the name - this import should happen hopefully without any errors - if you get an error than post it here so we can follow up on that
  4. cluster synchronization is part of the hacmp menus in smitty - look under Extended Configuration ...

Kind regards
zxmaus

Thanks for the help will have to give this a try on the weekend as it's production.

check the lun's reserve policy (lsattr -El hdiskx)

should be set to no_reserve, if not set it

chdev -l hdiskx -a reserve_policy=no_reserve

it's always a good idea to try out manually what the cluster is doing
normally there is no need to set the vg online, since it should be concurrent passive online, as soon as the cluster starts

when the resource group moves, the vg will be set to concurrent active on one node, and concurrent passive on the other node

ist the vg concurrent capable? (lsvg vgname)

VOLUME GROUP: xxxvg                 VG IDENTIFIER:  xxxxxx
VG STATE:           active                   PP SIZE:        128 megabyte(s)
VG PERMISSION:      read/write               TOTAL PPs:      67478 (8637184 megabytes)
MAX LVs:            256                      FREE PPs:       1078 (137984 megabytes)
LVs:                31                       USED PPs:       66400 (8499200 megabytes)
OPEN LVs:           31                       QUORUM:         1 (Disabled)
TOTAL PVs:          34                       VG DESCRIPTORS: 34
STALE PVs:          0                        STALE PPs:      0
ACTIVE PVs:         34                       AUTO ON:        no
Concurrent:         Enhanced-Capable         Auto-Concurrent: Disabled
VG Mode:            Concurrent                               
Node ID:            1                        Active Nodes:       2 
MAX PPs per VG:     131072                   MAX PVs:        1024                                                                                                                                                                            
LTG size (Dynamic): 256 kilobyte(s)          AUTO SYNC:      no                                                                                                                                                                              
HOT SPARE:          no                       BB POLICY:      relocatable                 

should look like this
vg state may be active or passive

Hi,

I have checked the lun's reserve policy and they are set to no_reserve

When checking the VG's there is no field for concurrent or VG mode so maybe this in a different release of HACMP?

Thanks,

Matt

could you please post lsvg of the vg on the running node?

(while it's varied on)

VG listsing as requested.

lsvg xxx02vg
VOLUME GROUP:       xxx02vg      VG IDENTIFIER:  00c58aa200004c0000000124fdaf434a
VG STATE:           active            PP SIZE:        32 megabyte(s)
VG PERMISSION:      read/write        TOTAL PPs:      1278 (40896 megabytes)
MAX LVs:            256               FREE PPs:       444 (14208 megabytes)
LVs:                8                 USED PPs:       834 (26688 megabytes)
OPEN LVs:           8                 QUORUM:         1 (Disabled)
TOTAL PVs:          2                 VG DESCRIPTORS: 3
STALE PVs:          0                 STALE PPs:      0
ACTIVE PVs:         2                 AUTO ON:        no
MAX PPs per VG:     32512
MAX PPs per PV:     1016               MAX PVs:        32
LTG size (Dynamic): 256 kilobyte(s)   AUTO SYNC:      no
HOT SPARE:          no                BB POLICY:      relocatable

this vg is not concurrent, have you already set up a cluster, or is this you first one?

you COULD use non concurrent vgs, and set them online and offline during the takeover with your own scripts, but this is not recommended

Hi,

you might - if you anyways change your VG - want to consider as well big or scalable VG - you are using rather small luns and most applications / DBs grow a lot over time. Going to scalable right in the beginning makes sure you don't get into trouble when you have to add further luns later (more than 32). Apart from that, it is the only way, your OS will be capable to keep ownerships of special files after an export/import - which would be particularly important if you maybe use sybase. If you choose scalable, you will not even have to care about running out of inodes ever.

Regards
zxmaus

The cluster has already been setup and has been working fine up until I had the errors on the weekend trying to move the resource group online on another node.