Hello,
We have an Power8 System (S822) and a IBM StorWize v3700 SAN.
The OS is AIX 7.1.
With this hardware from what I read I need to download/install special SDDPCM drivers, so I did (SDDPCM VERSION 2.6.6.0 (devices.sddpcm.71.rte).
I carved my volumes in the StorWize and presented to my single host (the SAN is directly connected via FC to the host). There is no virtualization of any kind, single bare bone AIX OS install.
The whole process was fairly easy, carve the RAID 10 LUNs, map to host, then in the host after I installed the SDDPCM drivers I run "cfgmgr" and LUNs get detected, from there I create the VGs and JFS2 filesystem.
# lspv hdisk4
PHYSICAL VOLUME: hdisk4 VOLUME GROUP: usr1
PV IDENTIFIER: 00f9af942979fd5c VG IDENTIFIER 00f9af9400004c000000014b2979fdc3
PV STATE: active
STALE PARTITIONS: 0 ALLOCATABLE: yes
PP SIZE: 512 megabyte(s) LOGICAL VOLUMES: 1
TOTAL PPs: 4607 (2358784 megabytes) VG DESCRIPTORS: 2
FREE PPs: 0 (0 megabytes) HOT SPARE: no
USED PPs: 4607 (2358784 megabytes) MAX REQUEST: 256 kilobytes
FREE DISTRIBUTION: 00..00..00..00..00
USED DISTRIBUTION: 922..921..921..921..922
MIRROR POOL: None
# lsvg usr1
VOLUME GROUP: usr1 VG IDENTIFIER: 00f9af9400004c000000014b2979fdc3
VG STATE: active PP SIZE: 512 megabyte(s)
VG PERMISSION: read/write TOTAL PPs: 4607 (2358784 megabytes)
MAX LVs: 256 FREE PPs: 0 (0 megabytes)
LVs: 1 USED PPs: 4607 (2358784 megabytes)
OPEN LVs: 1 QUORUM: 2 (Enabled)
TOTAL PVs: 1 VG DESCRIPTORS: 2
STALE PVs: 0 STALE PPs: 0
ACTIVE PVs: 1 AUTO ON: yes
MAX PPs per VG: 30480
MAX PPs per PV: 5080 MAX PVs: 6
LTG size (Dynamic): 256 kilobyte(s) AUTO SYNC: no
HOT SPARE: no BB POLICY: relocatable
PV RESTRICTION: none INFINITE RETRY: no
DISK BLOCK SIZE: 512 CRITICAL VG: no
Now if I look at the attributes of my "hdisk4" LUN..
# lsattr -El hdisk4
PCM PCM/friend/sddpcm PCM True
PR_key_value none Reserve Key True
algorithm load_balance Algorithm True
clr_q no Device CLEARS its Queue on error True
dist_err_pcnt 0 Distributed Error Percentage True
dist_tw_width 50 Distributed Error Sample Time True
flashcpy_tgtvol no Flashcopy Target Lun False
hcheck_interval 60 Health Check Interval True
hcheck_mode nonactive Health Check Mode True
location Location Label True
lun_id 0x2000000000000 Logical Unit Number ID False
lun_reset_spt yes Support SCSI LUN reset True
max_coalesce 0x40000 Maximum COALESCE size True
max_transfer 0x40000 Maximum TRANSFER Size True
node_name 0x500507680302218c FC Node Name False
pvid 00f9af942979fd5c0000000000000000 Physical volume identifier False
q_err yes Use QERR bit True
q_type simple Queuing TYPE True
qfull_dly 2 delay in seconds for SCSI TASK SET FULL True
queue_depth 20 Queue DEPTH True
recoverDEDpath no Recover DED Failed Path True
reserve_policy no_reserve Reserve Policy True
retry_timeout 120 Retry Timeout True
rw_timeout 60 READ/WRITE time out value True
scbsy_dly 20 delay in seconds for SCSI BUSY True
scsi_id 0xab0100 SCSI ID False
start_timeout 180 START unit time out value True
svc_sb_ttl 0 IO Time to Live True
timeout_policy fail_path Timeout Policy True
unique_id 332136005076300810886380000000000000B04214503IBMfcp Device Unique Identification False
ww_name 0x500507680306218c FC World Wide Name False
# lspv -l hdisk4
hdisk4:
LV NAME LPs PPs DISTRIBUTION MOUNT POINT
fslv00 4607 4607 922..921..921..921..922 /usr1
Some PCMPATH data..
# pcmpath query adapter
Total Dual Active and Active/Asymmetric Adapters : 2
Adpt# Name State Mode Select Errors Paths Active
0 fscsi2 NORMAL ACTIVE 429216571 0 6 6
1 fscsi3 NORMAL ACTIVE 70490560 0 6 6
# pcmpath query device
Total Dual Active and Active/Asymmetric Devices : 6
DEV#: 2 DEVICE NAME: hdisk2 TYPE: 2145 ALGORITHM: Load Balance
SERIAL: 60050763008108863800000000000006
==========================================================================
Path# Adapter/Path Name State Mode Select Errors
0 fscsi2/path0 OPEN NORMAL 71681 0
1* fscsi3/path1 OPEN NORMAL 98 0
DEV#: 3 DEVICE NAME: hdisk3 TYPE: 2145 ALGORITHM: Load Balance
SERIAL: 60050763008108863800000000000007
==========================================================================
Path# Adapter/Path Name State Mode Select Errors
0* fscsi2/path0 OPEN NORMAL 56 0
1 fscsi3/path1 OPEN NORMAL 10228 0
DEV#: 4 DEVICE NAME: hdisk4 TYPE: 2145 ALGORITHM: Load Balance
SERIAL: 6005076300810886380000000000000B
==========================================================================
Path# Adapter/Path Name State Mode Select Errors
0* fscsi2/path0 OPEN NORMAL 78 0
1 fscsi3/path1 OPEN NORMAL 49789138 0
DEV#: 5 DEVICE NAME: hdisk5 TYPE: 2145 ALGORITHM: Load Balance
SERIAL: 6005076300810886380000000000000C
==========================================================================
Path# Adapter/Path Name State Mode Select Errors
0 fscsi2/path0 OPEN NORMAL 1302771 0
1* fscsi3/path1 OPEN NORMAL 70 0
DEV#: 6 DEVICE NAME: hdisk6 TYPE: 2145 ALGORITHM: Load Balance
SERIAL: 6005076300810886380000000000000D
==========================================================================
Path# Adapter/Path Name State Mode Select Errors
0* fscsi2/path0 OPEN NORMAL 63 0
1 fscsi3/path1 OPEN NORMAL 20690963 0
DEV#: 7 DEVICE NAME: hdisk7 TYPE: 2145 ALGORITHM: Load Balance
SERIAL: 6005076300810886380000000000000E
==========================================================================
Path# Adapter/Path Name State Mode Select Errors
0 fscsi2/path0 OPEN NORMAL 427841922 0
1* fscsi3/path1 OPEN NORMAL 63 0
My question is about the "algorithm", currently (by default) this is set to load_balance.
The host (S822) has a dual FC path to each canister on the V3700. The additional enclosures on the V3700 also have dual path via SAS to each other enclosure's canister. So the idea is we can loose one path and continue to operate without downtime.
Now if algorithm = load_balance and a single path fails (lets say I unplug 1 of the FC cables), what will happen? Will it switch into failover or must I set this to "failover" at the start?
If the algorithm is "failover", is only 1 path active and the other on stand-by?
My goal is to have full redundancy.
Any experts able to give some feedback?
---------- Post updated at 06:31 PM ---------- Previous update was at 06:20 PM ----------
Oh almost forgot to mention vg_root is not off the SAN, the host has 2 internal disks that I've setup for vg_root.
---------- Post updated 03-04-15 at 12:50 PM ---------- Previous update was 03-03-15 at 06:31 PM ----------
It seems that according to this article Guide to selecting a multipathing path control module for AIX or VIOS I could leave this as "load_balance" and loose a path, the PCM can auto detect dead paths and recover them when they are back up without user interaction. Time to test