Volume group not activated at boot after SAN migration

rbatte1 · August 14, 2015, 9:27am

I have an IBM blade running RHEL 5.4 server, connected to two Hitachi SANs using common fibre cards & Brocade switches. It has two volume groups made from old SAN LUNs. The old SAN needs to be retired so we allocated LUNs from the new SAN, discovered the LUNs as multipath disks (4 paths) and grew the volume groups. We then moved the logical volumes and removed the old LUNs from the volume groups.

On the next boot, the volume group was not active when the filesystems tried to mount. The output from vgdisplay -v showed LV Status : Not available for every LV in the VG.

Panicking a little, I managed to find a tenuous suggestion to force the volume group on-line. This I did with vgchange -a y vgname and after that all the filesystems could be mounted.

PHEW!

The next boot did the same thing, so again I forced the VGs on-line & mounted everything. I then wrote to Red-Hat. They suggested I alter /etc/fstab to add an option _netdev as an option for each filesystem. It worked, which is great but I just don't get it

Their explanation is:

The last line says that it's doing the manual tasks I performed so I understand what it does, but I'm confused on a few things that RH don't answer (issues solved, ticket closed!):-

Why does migrating between SAN have this effect (same fibre cards & switches)
How does the option in fstab affect if the VG in on-line or not?
What can one do to prepare for this? I don't really want to set _netdev as a standard.
What is the cost/penalty/limit of doing this?

We don't auto-start the Oracle databases on this server, although we do for other servers. Might this workaround upset these or any other /etc/rc.d/..... scripts.

Just trying to understand

Robin

vbe · August 14, 2015, 9:35am

Is it SAN or NAS we are talking about? ( I have no RH anymore... so cant check...) but since LVM is the one HP gave to community it should behave similarly I believe, and I understand ISCSI uses standard network interface but SAN uses HBAs which are not the same and so does not need network ( TCP stuff ) up to work...
*
Addendum
Is there not a /etc/lvmrc file there? if so what does it say?

cero · August 14, 2015, 11:32am

Did the old SAN present the LUNs via iSCSI? I'd check for differences in the settings on the SANs.

rbatte1 · August 14, 2015, 12:02pm

We're sure this is not NAS. Plain fibre cards with SCSI devices being detected. We probe down the fibre cards which generates /dev/sdn devices. We then blend the four paths to form multipath devices that we then use as a PV to add to the volume groups.

Much of this is automatic first time, but we did a bit of the hokey-cokey on our first few attempts so after clearing old definitions away the auto-discovery stopped.

Everything seemed fine whilst we had the new LUNs in the VG and in use but the old ones not removed. A boot would complete normally, it's just when we've removed the old LUNs from the VGs that it went off. Perhaps the old LUNs (as part of the VGs) had the VG information on them and so they would respond and the VG would vary on even though there were LUNs still initialising.

All very confusing though

Robin

Peasant · August 14, 2015, 9:18pm

What was the naming policy of the new and old luns (multipath.conf generic mpathxy or the administrator specified the names ?)

What method was used during migration (pvmove, lvmirror or ?)

Can you post :

cat /etc/lvm/lvm.conf
multipath -ll
vgdisplay -v <volume group in question>
dmsetup info

Was /dev/mapper/name used LVM/VG operations ?