Hi,
we have a vew boxes using MPIO and they are connected to some virtualization software managing some disk subsystems, offering volumes to the AIX boxes.
Sometimes when a cable has been plugged out for a test or when a real problem occurs, using lspath to show the state of the paths shows correct, that for example 1 path is failed, the other enabled. When the cable is plugged back in again or the problem has been recovered, that path still shows that it is failed. Even waiting some time, this will not recover. No matter what we tried will change that but a reboot of the box. I do not remember exactly if the path being shown as "failed" did still work (I thought I issued a fcstat and there was bytes counting up, not sure though, too long ago) even though the lspath showed.
Did anybody have had any similar experience with MPIO? We thought that since MPIO is some years on the market now, that an obvious problem like not updating the status of a path should be obsolete. So we came to the conclusion that it might be some kind of incompability with our virtualization software.
I never saw something like it on a box using Powerpath.
Additionally, this problem does not happen every time and not on all of the MPIO boxes.
Our boxes are running AIX 5.3 TL11 SP4.
Any hints are welcome.
---------- Post updated at 09:08 AM ---------- Previous update was at 08:54 AM ----------
Here the config of a path from a box that had no problem so far - the other boxes have same parameters for health check etc.:
> lsattr -El hdisk2
PCM PCM/friend/dcfcpother Path Control Module False
algorithm fail_over Algorithm True
clr_q no Device CLEARS its Queue on error True
dist_err_pcnt 0 Distributed Error Percentage True
dist_tw_width 50 Distributed Error Sample Time True
hcheck_cmd inquiry Health Check Command True
hcheck_interval 60 Health Check Interval True
hcheck_mode nonactive Health Check Mode True
location Location Label True
lun_id 0x1000000000000 Logical Unit Number ID False
max_transfer 0x40000 Maximum TRANSFER Size True
node_name 0x20070030d910849e FC Node Name False
pvid 00c6c34f19954aed0000000000000000 Physical volume identifier False
q_err yes Use QERR bit True
q_type simple Queuing TYPE True
queue_depth 16 Queue DEPTH True
reassign_to 120 REASSIGN time out value True
reserve_policy single_path Reserve Policy True
rw_timeout 70 READ/WRITE time out value True
scsi_id 0x829980 SCSI ID False
start_timeout 60 START unit time out value True
unique_id 3214fi220001_somelunidentifier Unique device identifier False
ww_name 0x210100e08ba2958f FC World Wide Name False