Exportvg/importvg causes corrupt LV Control Block

Hi experts,

Power7 p720
AIX 6.1

This is what happened:

$ sudo importvg -y v7000_1vg hdisk6
0516-622 synclvodm: Warning, cannot write lv control block data.
0516-622 synclvodm: Warning, cannot write lv control block data.
0516-622 synclvodm: Warning, cannot write lv control block data.
0516-622 synclvodm: Warning, cannot write lv control block data.
0516-622 synclvodm: Warning, cannot write lv control block data.
0516-622 synclvodm: Warning, cannot write lv control block data.
0516-622 synclvodm: Warning, cannot write lv control block data.
0516-622 synclvodm: Warning, cannot write lv control block data.
0516-622 synclvodm: Warning, cannot write lv control block data.
0516-622 synclvodm: Warning, cannot write lv control block data.
0516-622 synclvodm: Warning, cannot write lv control block data.
0516-622 synclvodm: Warning, cannot write lv control block data.
0516-622 synclvodm: Warning, cannot write lv control block data.
0516-622 synclvodm: Warning, cannot write lv control block data.
0516-622 synclvodm: Warning, cannot write lv control block data.
0516-622 synclvodm: Warning, cannot write lv control block data.
0516-622 synclvodm: Warning, cannot write lv control block data.
0516-622 synclvodm: Warning, cannot write lv control block data.
0516-622 synclvodm: Warning, cannot write lv control block data.
0516-622 synclvodm: Warning, cannot write lv control block data.
0516-622 synclvodm: Warning, cannot write lv control block data.
v7000_1vg

Volume Group shows 'active'

$ lsvg v7000_1vg
VOLUME GROUP:       v7000_1vg                VG IDENTIFIER:  00f7974500004c0000000137ecf731f1
VG STATE:           active                  PP SIZE:        1024 megabyte(s)
VG PERMISSION:      read/write               TOTAL PPs:      3904 (3997696 megabytes)
MAX LVs:            256                      FREE PPs:       2487 (2546688 megabytes)
LVs:                21                       USED PPs:       1417 (1451008 megabytes)
OPEN LVs:           0                        QUORUM:         2 (Enabled)
TOTAL PVs:          1                        VG DESCRIPTORS: 2
STALE PVs:          0                        STALE PPs:      0
ACTIVE PVs:         1                        AUTO ON:        yes
MAX PPs per VG:     32512                                     
MAX PPs per PV:     4064                     MAX PVs:        8
LTG size (Dynamic): 256 kilobyte(s)          AUTO SYNC:      no
HOT SPARE:          no                       BB POLICY:      non-relocatable
PV RESTRICTION:     none                     INFINITE RETRY: no

Below, the fields 'TYPE' and 'MOUNT POINT' are missing

$ lsvg -l v7000_1vg
v7000_1vg:
LV NAME             TYPE       LPs     PPs     PVs  LV STATE      MOUNT POINT
loglv02                        1       1       1    closed/syncd  N/A
fslv18                         11      11      1    closed/syncd  N/A
fslv19                         200     200     1    closed/syncd  N/A
fslv20                         100     100     1    closed/syncd  N/A
fslv21                         100     100     1    closed/syncd  N/A
fslv22                         130     130     1    closed/syncd  N/A
fslv23                         100     100     1    closed/syncd  N/A
fslv24                         100     100     1    closed/syncd  N/A
fslv25                         100     100     1    closed/syncd  N/A
fslv26                         100     100     1    closed/syncd  N/A
fslv27                         100     100     1    closed/syncd  N/A
fslv28                         100     100     1    closed/syncd  N/A
fslv29                         100     100     1    closed/syncd  N/A
fslv30                         30      30      1    closed/syncd  N/A
fslv31                         30      30      1    closed/syncd  N/A
fslv32                         30      30      1    closed/syncd  N/A
fslv33                         30      30      1    closed/syncd  N/A
fslv34                         30      30      1    closed/syncd  N/A
fslv35                         10      10      1    closed/syncd  N/A
fslv36                         10      10      1    closed/syncd  N/A
fslv37                         5       5       1    closed/syncd  N/A

I tried to mount manually

$ sudo mount /dev/fslv19 /notes/d24aml06/mail1
mount: 0506-322 Cannot determine log device to use for /dev/fslv19 (/notes/d24aml06/mail1).

Log LV is missing data

$ sudo getlvcb -AT loglv02
         
         intrapolicy =  
         copies = 1 
         interpolicy =  
         lvid =  
         lvname =  
         label =  
         machine id =  
         number lps = 1 
         relocatable =  
         strict =  
         stripe width = 0 
         stripe size in exponent = 0 
         type =  
         upperbound = 0 
         fs =  
         time created  =         time modified = 

Below is the evidence that the Logical Volume Control Block is corrupted

$ sudo getlvcb -AT fslv19 
         B
         intrapolicy =  
         copies = 1 
         interpolicy =  
         lvid =  
         lvname =  
         label =  
         machine id =  
         number lps = 200 
         relocatable =  
         strict =  
         stripe width = 0 
         stripe size in exponent = 0 
         type =  
         upperbound = -21846 
         fs =  
         time created  =    time modified =  

Unable to rebuild the LVCB

$ sudo synclvodm v7000_1vg
0516-622 synclvodm: Warning, cannot write lv control block data.
0516-622 synclvodm: Warning, cannot write lv control block data.
0516-622 synclvodm: Warning, cannot write lv control block data.
0516-622 synclvodm: Warning, cannot write lv control block data.
0516-622 synclvodm: Warning, cannot write lv control block data.
0516-622 synclvodm: Warning, cannot write lv control block data.
0516-622 synclvodm: Warning, cannot write lv control block data.
0516-622 synclvodm: Warning, cannot write lv control block data.
0516-622 synclvodm: Warning, cannot write lv control block data.
0516-622 synclvodm: Warning, cannot write lv control block data.
0516-622 synclvodm: Warning, cannot write lv control block data.
0516-622 synclvodm: Warning, cannot write lv control block data.
0516-622 synclvodm: Warning, cannot write lv control block data.
0516-622 synclvodm: Warning, cannot write lv control block data.
0516-622 synclvodm: Warning, cannot write lv control block data.
0516-622 synclvodm: Warning, cannot write lv control block data.
0516-622 synclvodm: Warning, cannot write lv control block data.
0516-622 synclvodm: Warning, cannot write lv control block data.
0516-622 synclvodm: Warning, cannot write lv control block data.
0516-622 synclvodm: Warning, cannot write lv control block data.
0516-622 synclvodm: Warning, cannot write lv control block data.

In a nutshell, after importing the VG I cannot use the data within it.

Next step?

can you post the error report.

errpt 

is this internal disk or coming through SAN

This is SAN

$ lsdev -Cc disk | grep hdisk6
hdisk6 Available 03-00-02 IBM MPIO DS5020 Array Disk
$ lspath -l hdisk6
Enabled hdisk6 fscsi0
Enabled hdisk6 fscsi2
$ errpt
B6267342   0828123013 P H hdisk6         DISK OPERATION ERROR
C1348779   0828123013 I O SYSJ2          LOG I/O ERROR
E86653C3   0828123013 P H LVDD           I/O ERROR DETECTED BY LVM
B6267342   0828123013 P H hdisk6         DISK OPERATION ERROR
C1348779   0828123013 I O SYSJ2          LOG I/O ERROR
E86653C3   0828123013 P H LVDD           I/O ERROR DETECTED BY LVM
B6267342   0828123013 P H hdisk6         DISK OPERATION ERROR
C1348779   0828123013 I O SYSJ2          LOG I/O ERROR
E86653C3   0828123013 P H LVDD           I/O ERROR DETECTED BY LVM

---------- Post updated at 05:33 PM ---------- Previous update was at 05:05 PM ----------

Tried to update the LVCB manually...

$ sudo chlv -t jfs2log loglv02
0516-711 chlv: Warning, unable to update logical volume
        control block.

no can do :frowning:

A wild guess is that some locks remained when you exported the VG. Is the VG in ECM (enhanced concurrent mode)?

If so, try a "varyonvg -b -u", then "exportvg", then reimport again. This might help.

I hope this helps.

bakunin

nope.. it's not concurrent mode

The next idea is checking the VGDA, it might just be corrupted.

If this is the case, you might have to repair the VGDA data, which is not an easy task (and without danger neither).

There is an (undocumented! so use at your own risk) command "readvgda". If used on a good copy it will show you the complete VG layout, including LVs, their PP mappings, etc.. I suggest you read also the LVM redbook to understand the concepts behind the following actions. CAUTION: these actions have the potential to harm your data, carefully examine all the steps, save everything you do and only proceed when feeling comfortable!

You need a "/etc/filesystems" too, because you do not get the mount points from the VGDA.

Now construct map files for each LV from the VGDA info. Then export the VG and recreate it on the original PVs and with the original PP size, then recreate each LV usin the constructed map file on its original place:

mkvg -f -y <VGname> -s <PPsize> <hdisk_dev_1> <hdisk_dev_2> ...
mklv -y <LVname> -m /path/to/mapfile -t jfslog VGname <numlp>
mklv ...

(I was supposing jfs here, change the commands accordingly if you use jfs2.)

Now restore the "/etc/filesystems" but DO NOT MOUNT yet!

For each filesystem update the LVCB with the log and label information:

chfs -a log=<logdevice> <mountpoint>
chlv -L <mountpoint> <LVname>

NOW run "fsck" over each FS - WITHOUT the "-y"! If "fsck" complains about not being able to read the superblock the respective LV was not recreated correctly and you have to start over. If the jfslog LV was not recreated correctly, then "fsck" will not be able to replay the log and you will probably lose some data, so better double- and triple-check your mappings before you answer "yes" to any of "fsck" questions if it should correct anything.

Note, that if a filesystem was online at the time of the crash fsck might find some (unrecoverable) errors because of corrupted inode-lists. The only thing you can do is to fix this, but it will probably make you lose data. (Serves you right if you have no working backup.)

Re-run "fsck" until there are no errors any more to correct, then mount FSes. If any errors were fixed you should check "lost+found" for the remains of orphaned files/blocks. You might be able to regain some of the lost data from there.

I hope this helps.

bakunin

1 Like

I don't have the original /etc/filesystems

I am contemplating recreating the VG (loosing all data)

Before you really scratch the VG you might first find out what happened to hdisk6. Run "errpt" again with the "-a" switch to see the details and analyse what stands there. Eventually you could post it here and have others input about what it means.

You can still scratch the VG then, but: there is some reason why you encountered I/O errors. Disk broken, SAN damaged - i do not know what it is. But i do know (from long and often tragic experience) that a problem without a symptom is still a problem and it tends to creep up and haunt you exactly the moment you have the least need for that. So solve your problem with your disk first before you do anything with it, including (but not limited to) recreating the VG.

I hope this helps.

bakunin

1 Like

yep, the LUN was corrupted, the SAN folks fixed that problem. Gonna recreate the crap outta that VG. thanks for all the help.

exportvg should not be changing the VGDA in anyway, in fact it can be done with the disks of the volume group inactive.

I think something had happened to your LVCB before hand.

Consider the following command sequence. This disk is active, but the commands should give (nearly) the same output (there may be in the VGDA whether the VGDA is active or not).

The tail command is to show that the -dt options shows how all PP are mapped.

p.s. My prompt is not #, but I have performed an su to root so my euid is 0

michael@x054:[/home/michael]lspv
hdisk1          00cbe32e3ea9f4d2                    vgData          active      
hdisk0          005d858f5e3e41d2                    rootvg          active      
hdisk2          005d858fdba3ba0e                    vgBackup                    
hdisk3          005d858f9b15ace8                    vgITDS          active      
michael@x054:[/home/michael]lquerypv -p 005d858f5e3e41d2 -N hdisk0 -t | more
PP Size:        26
PV State:       1
Total PPs:      542
Alloc PPs:      252
Total VGDAs:    2
HOT SPARE:      0
Beg PSN:        4352
MIRROR POOL:    0
MIRROR POOL     
ASYNC LOW WA    0
ASYNC HIGH W    0
ASYNC MINOR     0
ASYNC FLAGS:    0
michael@x054:[/home/michael]lquerypv -p 005d858f5e3e41d2 -N hdisk0 -dt | tail
PVMAP:  005d858f5e3e41d2:533  0 ODMtype  0000000000000000.0   0     0000000000000000:0    0000000000000000:0   
PVMAP:  005d858f5e3e41d2:534  0 ODMtype  0000000000000000.0   0     0000000000000000:0    0000000000000000:0   
PVMAP:  005d858f5e3e41d2:535  0 ODMtype  0000000000000000.0   0     0000000000000000:0    0000000000000000:0   
PVMAP:  005d858f5e3e41d2:536  0 ODMtype  0000000000000000.0   0     0000000000000000:0    0000000000000000:0   
PVMAP:  005d858f5e3e41d2:537  0 ODMtype  0000000000000000.0   0     0000000000000000:0    0000000000000000:0   
PVMAP:  005d858f5e3e41d2:538  0 ODMtype  0000000000000000.0   0     0000000000000000:0    0000000000000000:0   
PVMAP:  005d858f5e3e41d2:539  0 ODMtype  0000000000000000.0   0     0000000000000000:0    0000000000000000:0   
PVMAP:  005d858f5e3e41d2:540  0 ODMtype  0000000000000000.0   0     0000000000000000:0    0000000000000000:0   
PVMAP:  005d858f5e3e41d2:541  0 ODMtype  0000000000000000.0   0     0000000000000000:0    0000000000000000:0   
PVMAP:  005d858f5e3e41d2:542  0 ODMtype  0000000000000000.0   0     0000000000000000:0    0000000000000000:0   
michael@x054:[/home/michael]

Question: do you get the same issue when you su to root (is there perhaps an authority not passing when using sudo?)

---------- Post updated at 10:24 PM ---------- Previous update was at 10:23 PM ----------

Should have read the bottom line. So I guess you can ignore my reply. :o