How to manually -re-attach AIX lv's to a mirror?

mrmurdock · January 14, 2019, 12:32pm

in trying to rectify a stale lv problem I ran rmlvcopy <lv> 1 <primary disk> leaving the original os disk without lv copies other than the stale lv.
Both disks seem operational, but, l svg rootg shows 1 stale pv.

The end goal is to re-attach the lv's back to hdisk1, and then attempt a reboot off of hdisk1 to sync things up again.

#lsvg -l rootvg
rootvg:
LV NAME             TYPE       LPs     PPs     PVs  LV STATE      MOUNT POINT
hd5                 boot       1       1       1    closed/syncd  N/A
hd6                 paging     4       4       1    open/syncd    N/A
hd8                 jfs2log    1       1       1    open/syncd    N/A
hd4                 jfs2       60      60      1    open/syncd    /
hd2                 jfs2       40      80      2    open/stale    /usr   <== not sure why the lv is in stale mode!
hd9var              jfs2       16      16      1    open/syncd    /var
hd3                 jfs2       20      20      1    open/syncd    /tmp
hd1                 jfs2       40      40      1    open/syncd    /home
hd10opt             jfs2       40      40      1    open/syncd    /opt
hd11admin           jfs2       1       1       1    open/syncd    /admin
livedump            jfs2       1       1       1    open/syncd    /var/adm/ras/livedump
lvol1               jfs2       60      60      1    open/syncd    /usr/sys/inst.images

# ls -m hd2          <== StaLE LV
LP    PP1  PV1               PP2  PV2               PP3  PV3
0001  0222 hdisk1            0509 hdisk0            
0002  0229 hdisk1            0510 hdisk0            
0003  0230 hdisk1            0511 hdisk0            
0004  0231 hdisk1            0512 hdisk0            
0005  0232 hdisk1            0513 hdisk0            

# lslv -m hd1
hd1:/home
LP    PP1  PV1               PP2  PV2               PP3  PV3
0001  0585 hdisk0            
0002  0586 hdisk0            
0003  0587 hdisk0            
0004  0588 hdisk0            
0005  0589 hdisk0

bakunin · January 14, 2019, 1:43pm

Let us first establish what "stale" means here. Bear with me if this is old news for you: when you have a mirrored LV (basically there are only mirrored LVs, a mirrored VG means just that all LVs are mirrored) each LP (logical partition) is represented by two different PPs (physical partition). An LV is considered stale if any of its LPs is not represented by two (or three, depending on the number of mirrors) PPs.

If the mirroring is recreated (that happens in the background) all the LVs that are not completely mirrored yet are marked "stale" too. Check for a processes named syncvg in the process list. If it is there you just need to wait. You can also check the output of lsvg rootvg to see if the number of stale LPs decrease.

Furthermore, your OS disk does not only contain VG information but also is also instrumented to be booted from. Whenever you alter (the disks of) your rootvg you need to reestablish the boot code by using the bosboot command - this puts theboot code onto the disk and thus makes it bootable. Furthermore you may need to alter the bootlist by (re-)creating it with the bootlist command. I just wanted to say this up front because it is easily forgotten once in a while.

Back to remedying your situation: the first thing you should do is to make absolutely sure you have a valid, working and installable backup, preferably in form of an mksysb image, most preferably on your NIM server. However far from ideal your current situation is: take the time to create such an image before you try anything else. Whenever you do non-trivial tasks to your rootvg you run a non-zero chance of ending with a non-working system. With an image you can at least get where you have been. If you know your trade you can run mkszfile before running mksysb and then edit this file to create a non-mirrored backup image. Normally the image will be restored the same way the system was installed when the image was taken, with all the mirrors, etc. in place. It may be preferable to have the image been taken in an unmirrored fashion so that it restores without a mirror on one single disk and only then do a mirrorvg manually. Again: don't forget bosboot and bootlist afterwards.

Which brings us to your disk: you probably have isolated the culprit to the one LV you still have on it after you removed all the other ones. Do an unmirrorvg to completely make the disk empty and a reducevg to get it out of the VG. Your VG should now be in unmirrored but otherwise healthy state. If you want you could now extensively test the disk and eventually reuse it but i wouldn't. The gain of what a single disk costs is simply not worth the effort it takes to reinstall a system that crashed because of a failing disk, not to mention the costs of the downtime of the service provided by the system itself. Get a new disk, put it in, do an extendvg and finally a mirrorvg . After you issued the mirrorvg command it takes some time until the mirrors are resynchronized. Until that the LVs are still shown to be "stale".

To speed up things (and if you have enough RAM because that takes some of it) you can do like i usually do:

mirrorvg -s rootvg hdiskN     # mirrors but does NOT start to synchronise
syncvg -P:32 -v rootvg &      # mirrors in background, syncing 32 LPs in parallel

Notice that 32 is the maximum. Use less if you have not enough RAM. The needed amount is the PP-size (times the number). You can also set a certain number of parallel tasks in advance by putting into /etc/environment the following line:

NUM_PARALLEL_LPS=NN

This will also affect HACMP/PowerHA commands, unlike the same setting in root s profile, which are ignored. Also notice that activatevg and varyonvg will (re-)start the synchronisation process too if the VG has stale partitions.

I hope this helps.

bakunin

mrmurdock · January 14, 2019, 4:30pm

Thank you.
I may not be in as bad of shape as I think i am. lslv -l hd2 shows hdisk1 with a 0% in the IN BAND column, which from the man pages sounds like the OS is not writing to the lv anymore.

# lslv -l hd2
hd2:/usr
PV                COPIES        IN BAND       DISTRIBUTION  
hdisk1            040:000:000   0%            001:039:000:000:000 
hdisk0            040:000:000   100%          000:000:040:000:000

 synclvodm  rootvg

returns with no errors.
It almost seems like i could just pull hdisk1 out and be ok at this point. Its a gut wrenching decision (probably wont do it though). I have re-ran bosboot -ad /dev/hdisk0 and made sure my bootlist lists hdisk0 first. If there was trouble as far as os problems, I would expect my OS by now to be choking and dying if I had any filesystem access, os command errors, accessing hdisk0, however, its still running fine (running DB2 and Informix developement DB's).

bakunin · January 14, 2019, 8:31pm

mrmurdock:

I may not be in as bad of shape as I think i am. lslv -l hd2 shows hdisk1 with a 0% in the IN BAND column, which from the man pages sounds like the OS is not writing to the lv anymore.
# lslv -l hd2
hd2:/usr
PV                COPIES        IN BAND       DISTRIBUTION  
hdisk1            040:000:000   0%            001:039:000:000:000 
hdisk0            040:000:000   100%          000:000:040:000:000 

Sorry, but: no. The "in band" means something completely different and has nothing to do with your problem. "In band" means: when you create LVs they are placed fittingly on the disk so that there is no place for extension. Like this, where a,b,c... mean the PPs of various LVs and X means free PPs:

aaaaabbccccccXXXXXXXX.....

Now, when you extend LVs or shrink them you over time end in a situation where this strict succcession is broken up, like this:

aaaccbbccccccaacbaXXXX.....

The initial situation is what is meant by "in band 100%": all the LVs are physically placed in one piece and the PPs are in the order of ascending LPs. Once your disk becomes more and more disorganised you can rectify this with the reorgvg command which moves around all the PPs until they are in order again. In your case the "in band 100%" comes from all PPs assigned to hd2 are placed on the "center" part of hdisk0 but on hdisk1 39 of the 40 are placed on "outer middle" and one is placed on "outer edge". Therefore the "in band" indicator shows 0%. But again, this has nothing to do with your problem.

This just means that the information in the ODM about the composition of the rootvg is accurate. This is a good thing but still does not help your problem.

DON'T!!

As i said before the information about the VG is stored in the ODM and if you simply remove the disk (without using the reducevg procedure i explained above) you end with this information being NOT accurate any more. Prepare to manually repair the ODM in a rather tedious fashion afterwards if you do that. (Don't think you could put in another disk to make up: disks are identified by a unique "PVID" when they become part of a VG, so the system knows that this disk is not that disk.) Before you pull out the disk remove it cleanly from the ODM and this is done by using the commands i explained above.

I hope this helps.

bakunin

mrmurdock · January 15, 2019, 12:42pm

this morning (or maybe after a nights rest), revealed the issue from lspv . The lspv hdisk1 this morining also shows the pv state: missing, although lspv shows all the disks online. none of the aix lvm commands are working on the disk (reducevg complains about the open hd2 lv, which is /usr, even if I use -f to force it). syncvg is not running in the background.
This is AIX 6.1 TL7 SP 10 1415 build date. I have had to run odmgets and odmdeletes before on other boxes. a little bit of tedious cleanup isnt all that bad. Unfortunately this is in a remote DC, so I have to rely on another pair of hands to pull the disk.

PHYSICAL VOLUME:    hdisk1                   VOLUME GROUP:     rootvg
PV IDENTIFIER:      00f649e07720beb9 VG IDENTIFIER     00f7382000004c000000015706543652
PV STATE:           missing                                    
STALE PARTITIONS:   38                       ALLOCATABLE:      yes
PP SIZE:            256 megabyte(s)          LOGICAL VOLUMES:  1
TOTAL PPs:          1117 (285952 megabytes)  VG DESCRIPTORS:   2
FREE PPs:           1077 (275712 megabytes)  HOT SPARE:        no
USED PPs:           40 (10240 megabytes)     MAX REQUEST:      1 megabyte
FREE DISTRIBUTION:  223..184..223..223..224                    
USED DISTRIBUTION:  01..39..00..00..00                         
MIRROR POOL:        None

AND ERRPT shows (finally)
Description
PV NO LONGER RELOCATING NEW BAD BLOCKS

Probable Causes
NON-MEDIA ERROR DURING SW RELOCATION

Failure Causes
DISK DRIVE
DISK DRIVE ELECTRONICS
STORAGE DEVICE CABLE

bakunin · January 15, 2019, 4:39pm

That was to be expected. I repeat:

You cannot use a reducevg on a disk which has not been emptied before. Since you have still a LV occupying space on the PV (even if it is only a mirror) you cannot remove the disk from the VG. You either have to remove the mirror on this disk first or move it to another PV.

If this is not due to a broken cable or controller (that would explain the "missing" status) the error message suggests that the disk was in its last throes anyway: when a disk is formatted (when included in a VG) a certain number of blocks is set aside to compensate for blocks going bad. They are used up over time. Once they are depleted (or nearly depleted) you usually see a series of TEMP hdisk errors (IIRC "hdisk error 3", usually stretched out over some days or weeks) before finally a PERM (IIRC "hdisk error 4") one in the errpt .

I hope this helps.

bakunin

mrmurdock · January 15, 2019, 5:09pm

so migratepv -l hd2 hdisk1 hdisk6 (yes I found a spare unused disk allocated, but I can delete the lv and vg on it. :)). My only concern would be since it cannot read the bad block to finish the mirror, is migratepv smart enough to move the stuck lv? I guess if migratepv cant, it will just error out.

bakunin · January 16, 2019, 1:58am

Yes, either that or migratelp or - as i said before - simply remove the LVs copy to empty the disk, then remove the disk, add a new disk to the VG and then remirror.

Actually the migratelp process can read the LP in question - just not its bad copy. And again, you do not need to have this copy at all! Just remove it, then recreate it again on the new disk.

Furthermore, lspv does not show the "online" disks, it just shows the relation of disks and VGs. Do a lsdev -Cc disk to see all disk devices and their status. "Available" is good, "Defined" means missing. It means the system has some device definition that once described an existing device, but this device is not there at the moment. If this does not reveal anything do a rmdev on all hdisk devices, then run cfgmgr to re-add the disk devices again. This is, as bad as it may sound, actually non-disruptive.

I think that, as the disk is alread in the status "missing" in the VGs display, you may have to rely on the procedure to get a missing disk out of the system. Again, this involves some odmget and odmdelete gymnastics which are non-trivial and possibly disruptive too. Do that only when the system is down AND after having made absolutely sure you have a working backup!

I hope this helps.

bakunin

/PS: you may also want to read this old thread for further information

mrmurdock · January 22, 2019, 12:06pm

Well, thankfully the system came up, after a remote admin rebooted the box (yes I admit it was my fault for not informing the IT group). IBM support is looking to make sure the hd2 is truly mirrored.
the LVM has somehow corrected itself, or finally bypassed the bad spot for the hd2 now shows sync'd. Thank you so much for your timely help and responses on this. If no one else ever appreciates your help, you have one person who does.

 rootvg:
LV NAME             TYPE       LPs     PPs     PVs  LV STATE      MOUNT POINT
hd5                 boot       1       1       1    closed/syncd  N/A
hd6                 paging     4       4       1    open/syncd    N/A
hd8                 jfs2log    1       1       1    open/syncd    N/A
hd4                 jfs2       60      60      1    open/syncd    /
0516-1147 : Warning - logical volume hd2 may be partially mirrored.
hd2                 jfs2       40      40      2    open/syncd    /usr
hd9var              jfs2       16      16      1    open/syncd    /var
hd3                 jfs2       20      20      1    open/syncd    /tmp
hd1                 jfs2       40      40      1    open/syncd    /home
hd10opt             jfs2       40      40      1    open/syncd    /opt
hd11admin           jfs2       1       1       1    open/syncd    /admin
livedump            jfs2       1       1       1    open/syncd    /var/adm/ras/livedump
lvol1               jfs2       60      60      1    open/syncd    /usr/sys/inst.images

rbatte1 · January 28, 2019, 6:48am

Just for completeness, did you manage to get hd2 properly mirrored? If so, can you post your command for anyone hitting finding this thread with a similar issue.

Thanks, in advance,
Robin

mrmurdock · January 28, 2019, 10:17am

Initially the mirror was successfully initiated and all the other volumes became mirrored and sync'd. it attached hd2 and was in process of syncing then during the sync it hit a bad sector (sector sparing failed on the hard disk) on the original disk. IBM is saying the VGDA is messed up on the original disk for whatever reason. They are rebuilding the hdisk1 vgda and hopefully this will get the hd2 in sync. One thing that surprises me is hd2 is now reporting open/sync whereas before it is was open/stale before IBM is rebuilding the vgda. Can lvm lie to me? IBM did a vgda map of hdisk1 with dd.

bakunin · January 29, 2019, 10:27pm

There is a little-known tool called readvgda which does what its name suggests - read the VGDA and print its content out to screen. I use it regularly when working on (to me) unknown systems when i need to find out about the VG type (classic, big or scalable) because otherwise there is no direct way to get that information elsewhere. You can use it for any other purposes too, though. The format is:

readvgda hdiskN [ | more]

where hdiskN is any hdisk device in the VG. Here is a longer article about it but knowing what to search for you find many others too.

The VGDA is stored in (at least) two different locations of the disk and if one is broken you may still read the other one and use that. You can easily do that with dd so you won't need IBM to do that for you. Of course this is a potentially risky operation so my suggestion is to test it first on a throwaway system/disk to get acquainted.

I hope this helps.

bakunin