MPIO reliability

Hi,

we have a vew boxes using MPIO and they are connected to some virtualization software managing some disk subsystems, offering volumes to the AIX boxes.
Sometimes when a cable has been plugged out for a test or when a real problem occurs, using lspath to show the state of the paths shows correct, that for example 1 path is failed, the other enabled. When the cable is plugged back in again or the problem has been recovered, that path still shows that it is failed. Even waiting some time, this will not recover. No matter what we tried will change that but a reboot of the box. I do not remember exactly if the path being shown as "failed" did still work (I thought I issued a fcstat and there was bytes counting up, not sure though, too long ago) even though the lspath showed.

Did anybody have had any similar experience with MPIO? We thought that since MPIO is some years on the market now, that an obvious problem like not updating the status of a path should be obsolete. So we came to the conclusion that it might be some kind of incompability with our virtualization software.

I never saw something like it on a box using Powerpath.

Additionally, this problem does not happen every time and not on all of the MPIO boxes.

Our boxes are running AIX 5.3 TL11 SP4.

Any hints are welcome.

---------- Post updated at 09:08 AM ---------- Previous update was at 08:54 AM ----------

Here the config of a path from a box that had no problem so far - the other boxes have same parameters for health check etc.:

> lsattr -El hdisk2
PCM             PCM/friend/dcfcpother                              Path Control Module              False
algorithm       fail_over                                          Algorithm                        True
clr_q           no                                                 Device CLEARS its Queue on error True
dist_err_pcnt   0                                                  Distributed Error Percentage     True
dist_tw_width   50                                                 Distributed Error Sample Time    True
hcheck_cmd      inquiry                                            Health Check Command             True
hcheck_interval 60                                                 Health Check Interval            True
hcheck_mode     nonactive                                          Health Check Mode                True
location                                                           Location Label                   True
lun_id          0x1000000000000                                    Logical Unit Number ID           False
max_transfer    0x40000                                            Maximum TRANSFER Size            True
node_name       0x20070030d910849e                                 FC Node Name                     False
pvid            00c6c34f19954aed0000000000000000                   Physical volume identifier       False
q_err           yes                                                Use QERR bit                     True
q_type          simple                                             Queuing TYPE                     True
queue_depth     16                                                 Queue DEPTH                      True
reassign_to     120                                                REASSIGN time out value          True
reserve_policy  single_path                                        Reserve Policy                   True
rw_timeout      70                                                 READ/WRITE time out value        True
scsi_id         0x829980                                           SCSI ID                          False
start_timeout   60                                                 START unit time out value        True
unique_id       3214fi220001_somelunidentifier                     Unique device identifier         False
ww_name         0x210100e08ba2958f                                 FC World Wide Name               False

I wonder if you could you post the adapter settings as well?

Hi, I know this problem, then you have to manually set the path online
we use

smitty mpio -> mpio path management -> enable paths for a device

but in my case, the paths come from 2 vio servers, which are connected to a IBM DS8300

when directly on the vio-servers, there are driver commands for setting paths online again, after replacing a damaged adapter for example

with sddpcm it's

pcmpath set adapter x online

@funksen
Thanks so far for the info - I don't remember if we tried that one but I will try that next time I get a chance.

Neither cost nor effort spared:

> lsattr -El fcs0
bus_intr_lvl  65765      Bus interrupt level                                False
bus_io_addr   0xefc00    Bus I/O address                                    False
bus_mem_addr  0xf0040000 Bus memory address                                 False
init_link     pt2pt      INIT Link flags                                    True
intr_priority 3          Interrupt priority                                 False
lg_term_dma   0x800000   Long term DMA                                      True
max_xfer_size 0x100000   Maximum Transfer Size                              True
num_cmd_elems 200        Maximum number of COMMANDS to queue to the adapter True
pref_alpa     0x1        Preferred AL_PA                                    True
sw_fc_class   2          FC Class for Fabric                                True

The other adapter has the same settings.

Here is the fscsi device:

> lsattr -El fscsi0
attach       switch    How this adapter is CONNECTED         False
dyntrk       yes       Dynamic Tracking of FC Devices        True
fc_err_recov fast_fail FC Fabric Event Error RECOVERY Policy True
scsi_id      0xa9f00   Adapter SCSI ID                       False
sw_fc_class  3         FC Class for Fabric                   True

The other device has the same settings.

Thanks.

Edit:
Just a note - I have currently no way to test/reproduce it so don't put too much effort into it. Any hint is good though.

In my case it takes some time for MPIO to rebuild path (VIO + N_portID). We got script that :
-lsdev (look for defined disk) & rmdev (if any)
-lspath (look for missing path) & rmpath
-cfgmgr

did you set different priorities to your paths ? We had similar problems as long as all our paths had the same priority ...

Regards
zxmaus

No clue if that was the case back then. Currently I found mixed settings like paths having the same priority and paths on another box with different priorities according to which virtualized storage they primarily talk to (while having algroithm=fail_over).
I also asked a coworker about it some seconds ago who told me he has the task to check and set all paths to different priorities.
I will keep it in mind, checking for path priority, just in case we have those strange effects again.

My company have encountered similar problems, and we have seen that some settings need to be set.

Here are the settings which have to be implemented.

Each child fibre device (fscsiX) has to have the following two modes set:

chdev -l fscsiX -a dyntrk=yes -a fc_err_recov=fast_fail

Additionally, every hdisk device needs to be changed (which I didn't see mentioned in the post).

chdev -l hdiskX -a reserve_policy=no_reserve

Lastly, you may want to check that the hcheck_interval is NOT set to 0, as then it won't check at all. Usual recommendation is to set to 30 (but 10 should be sufficient).

chdev -l hdiskX -a hcheck_interval=10

UPDATE: Sorry: The hcheck_interval idea was already mentioned by smurphy. I should have moved on to page 2.

One other thing to check is your "hcheck_interval" which is set at the disk level. The hcheck_interval tells your system how often to check, or re-check, FAILED paths and inactive ENABLED paths (in the case of "algorithm" being set to "fail_over") to ensure they are still connected and functioning. I suggest setting your hcheck_interval to 3600 (once an hour). You'll have to set this on all your disks individually. If the hcheck_interval is set to "0", then this disables it and the disk will never automatically change out of a FAILED or MISSING state.

Remember that MPIO is not like etherchannels, where it automatically re-enables all the paths as soon as the plug is back in. Something has to occur on the disk side to make it recheck them. Either the hcheck_interval comes around again, or you unplug your secondary fiber car which will cause AIX to suddenly start sending checks for all your disks down all the paths, FAILED or MISSING, and try to find a path that is working and it will set it back to ENABLED if it finds one.

hostname:/:$ lsattr -El hdisk0 | egrep "hcheck_interval"
hcheck_interval 3600                             Health Check Interval      True
hostname:/:$

Also, you can re-enable the paths manually by doing a chdev on it:

chdev -l hdisk0 -p vscsi0 -s enable

You can also see which path is being used by watching for numbers increasing in the output of "iostat -m":

hostname:/:$ iostat -m hdisk0

System configuration: lcpu=4 drives=7 ent=0.20 paths=10 vdisks=2

tty:      tin         tout    avg-cpu: % user % sys % idle % iowait physc % entc
          0.0         10.6                0.9   0.5   98.3      0.3   0.0    1.6

Disks:        % tm_act     Kbps      tps    Kb_read   Kb_wrtn
hdisk0           0.3      46.3       3.7   180755051  55682968

Paths:        % tm_act     Kbps      tps    Kb_read   Kb_wrtn
Path1            0.0       0.0       0.0          0         0
Path0            0.3      46.3       3.7   180755051  55682968
hostname:/:$