Interesting Problem! 2 VIOs, One is problematic, assigning disks and resources from the other only

Hi,

The scenario is like this:

1.We needed to assign two hdisks to an LPAR
2.SAN team gives us two ldevs
3.One of our VIO is hanging on cfgmgr operation
4. We ran cfgmgr on the smooth VIO. Got the disks and assigned the disks from there to the LPAR.(By passed the other VIO as in didnt run cfgmgr on them this time)
5. On the LPAR ran cfgmgr but the LPAR would also hang this time. :frowning:

My question is that is this the right way? Because one of our engineers was feeling that we need to have the disks on the second VIO as well and then proceed on the LPAR because the LPAR sees resources from both VIO. something like that. Is there a work around?

The VIO that hangs on cfgmgr has errpt showing a failed disk operation on some old disk. Can that be the problem for the VIO hanging on and on?

Regards :slight_smile:

At first make sure that you can access these disk from VIOS without problem.. for examlpe make volumegroup and import the also create filesystem on that disks...
try

lspv, lspath

and post the output

Hi,
Thanks for the reply :slight_smile:

I just checked that there is Multipathing done on both the VIOs.

I assigned these two disks from the "working VIO" to 5 LPARS.

Now on running cfgmgr on some of the LPARs the disks are visible. lspath too shows them "enabled". But on the other 2 or 3 LPARs running lspath shows these two disks as "defined" and cfgmgr hangs.

I guess this has something to do with MPIO aka multipathing. I fail to understand why the second VIO is hangiing :frowning:

So its 3 LPARs that are hanging at cfgmgr and the disks are in "defined" state when lspath is executed.

Regards :slight_smile:

First it is best to use the padmin commands for all VIO configuration. If in fact you are using the native padmin commands such as cfgdev etc but are showing the AIX equivalent for clarity then please excuse my comment.

Did you create the disks on the VIO using a new vhost or did you use an existing one but just added a new virtual disk?

Either way round if you run lsmap -all on the offending VIO do you see the disks in the listing?

A quick remedy I can think of is rmvdev the virtual disk and the physical disk then get the storage team to delete the LUN definition against the WWN of the FC adapter used on the VIO. Then ask the storage team to reallocate the same LUN to the FC adapter. Run cfgdev on the VIO which hopefully will not hang. Once cfgdev has finished run lspv -free to see if the disks are there.

I hope this helps.

Thanks for the reply :slight_smile:
Well I really didnt run padmin commands i.e the shell with the $. I ran:

$oem_setup_env
#_

Used the existing vhost and added a virtual disk.
Yes running the lsmap command does show the virtual disks.
(But please remember that this is being done on the healthy VIO. the second VIO hangs on commands like cfgmgr and lspath)

By the way if I want to delete the virtual disks what command should i use?
rmvdev -vtd (virtual_harddisk_name) -recursive?? But i need to delete them from the LPARs first right and when m done doing all this i need to go to the SAN team?

Regards :slight_smile:

Please do not delete them on the good VIO as this is unnecessary! Do you see the disks at all on the VIO which hangs on cfgdev?

You really should get used to using the padmin commands as life is far easier from the padmin restricted shell as the commands are far more compact and do more with a single line as in rmtcpip.

If you do see the disks on the problem VIO with lsmap -all or lspv -free then as I said try deleting them from the VIO (no need to delete them from the LPAR as they are not there I would assume). Get the storage team to unallocate the LUN from the WWN of the FC on the problem VIO and reallocate. Then run the command cfgdev (not cfgmgr) and then run lspv -free, you should see the new disks.

If you do not see the disks then check the WWN of the FC adapter.

Am I right to assume there are working external disks already mapped on the problem VIO and these disks are from the same storage medium?

Hi, Thanks for the reply :slight_smile:

Well didnt touch the configs yet.. Tomorrow will be trying something. I was thinking of shutting down the problem VIO and then temporarily continuing with the config of GPFS. This way MPIO wouldnt affect that much. Well lets see what happens

Regards.

Hi,

Well the issue has been resolved :slight_smile: SAN allocated new Ldevs. and now things are back to normal :slight_smile:

Regards.

This is exactly what I was talking about. Usually if an external disk is causing problems on the VIO then either the WWN was given incorrectly for the HBA or the LUN was created incorrectly. This is why I suggested deleting the hdisk from the VIO server and ask the storage team to delete and recreate the LUN.

Glad it is resolved as these types of fault can be very frustrating as you only have control of a small part of the infrastructure.

Did you set the "reserve_policy" attribute on the disk?

vio1:/home/padmin:# lsattr -El hdisk5 | grep reser
reserve_policy  no_reserve                       Reserve Policy                   True
vio1:/home/padmin:#

If you don't and one VIO server puts a VTD on the disk (assigns it to an LPAR), then a reserve is put on that disk by the VIO server and the other VIO server can't touch it. To change it to no_reserve, you have to remove the VTD, set the attribute, remove the disk (rmdev -l hdisk5), then bring the disk back (mkdev -l hdisk5) and the attribute should be properly set. Then recreate the VTD and have the other VIO server try again.

------------------
Update: Now I see there is a second page to this post. Oops.

Well I am posting after a long time.. Sorry for the delay I guess I forgot.

Well the issue was resolved by the SAN team. They reallocated the LUNs.. The disks are up on all Lpars :slight_smile:

Regards,

Hi ,Can I have vios version ? There are some bugs in 2.0 version