We've got two datacenters and in every datacenter 2 VIOs.
The VIO manages the I/O of the LPARs. So: Is it possible to reboot the VIO without shuting down an LPAR:wall:
yes, it is possible. but you will get after it, depends on your LPAR configuration.
Hi agent.kgb
Thanks for your answer!
What do you mean with:
How can I check the config of the LPAR to determine if a reboot would work without any trouble?
Let's say you have a managed system with two VIOs (vio1 and vio2) and a lot of fully-virtualized LPARs (lpar1, lpar2, ...).
If an LPAR has VSCSI connection, it must be connected to both VIOs.
$ lsdev -l vscsi*
vscsi2 Available Virtual SCSI Client Adapter
vscsi3 Available Virtual SCSI Client Adapter
$ lsattr -El vscsi2
rw_timeout 0 Virtual SCSI Read/Write Command Timeout True
vscsi_err_recov fast_fail N/A True
vscsi_path_to 30 Virtual SCSI Path Timeout True
$ lsattr -El vscsi3
rw_timeout 0 Virtual SCSI Read/Write Command Timeout True
vscsi_err_recov fast_fail N/A True
vscsi_path_to 30 Virtual SCSI Path Timeout True
VIO mappings:
$ echo "cvai" | kdb
NAME STATE CMDS_ACTIVE ACTIVE_QUEUE HOST
vscsi2 0x000007 0x0000000000 0x0 vio1->vhost1
vscsi3 0x000007 0x0000000000 0x0 vio2->vhost1
If you have VSCSI disks:
$ lsdev -Cc disk -s vscsi
hdisk6 Available Virtual SCSI Disk Drive
check their healthcheck parameters:
$ lsattr -El hdisk6
PCM PCM/friend/vscsi Path Control Module False
algorithm fail_over Algorithm True
hcheck_cmd test_unit_rdy Health Check Command True+
hcheck_interval 60 Health Check Interval True+
hcheck_mode nonactive <-- VERY BAD! Health Check Mode True+
max_transfer 0x40000 Maximum TRANSFER Size True
pvid 00000000000000000000000000000000 Physical volume identifier False
queue_depth 32 Queue DEPTH True
reserve_policy no_reserve Reserve Policy True+
and that the disks are available through both vscsi adapters:
$ lspath -l hdisk6
Enabled hdisk6 vscsi2
Enabled hdisk6 vscsi3
Check priorities of the paths:
$ lspath -AHE -l hdisk6 -p vscsi2
attribute value description user_settable
priority 2 Priority True
$ lspath -AHE -l hdisk6 -p vscsi3
attribute value description user_settable
priority 1 Priority True
If you use VFC (NPIV), check that you have at least 2 VFC adapters from both of VIOs:
$ lsdev -Cc adapter -t IBM,vfc-client
fcs2 Available C4-T1 Virtual Fibre Channel Client Adapter
fcs3 Available C5-T1 Virtual Fibre Channel Client Adapter
Check their mappings:
$ echo vfcs | kdb
NAME ADDRESS STATE HOST HOST_ADAP OPENED NUM_ACTIVE
fcs2 0xF1000A00001E8000 0x0008 vio1 vfchost14 0x01 0x0000
fcs3 0xF1000A00001EA000 0x0008 vio2 vfchost14 0x01 0x0000
Check that dyntrk and fast_fail for fscsi devices set to yes and fast_fail:
$ lsattr -El fscsi2
attach switch How this adapter is CONNECTED False
dyntrk yes Dynamic Tracking of FC Devices True
fc_err_recov fast_fail FC Fabric Event Error RECOVERY Policy True
scsi_id 0x340a01 Adapter SCSI ID False
sw_fc_class 3 FC Class for Fabric True
$ lsattr -El fscsi3
attach switch How this adapter is CONNECTED False
dyntrk yes Dynamic Tracking of FC Devices True
fc_err_recov fast_fail FC Fabric Event Error RECOVERY Policy True
scsi_id 0x330b01 Adapter SCSI ID False
sw_fc_class 3 FC Class for Fabric True
Then check that every disk device is available through both of the adapters. This check depends on your drivers. If you use AIX MPIO, you can look this information using lspath
command. For EMC - powermt
, for Hitachi - dlnkmgr
.
Then, when you are sure, that all of your disks are available through both VIOs, you must check network connection.
$ lsdev -Cc adapter -t IBM,l-lan -s vdevice
ent3 Available Virtual I/O Ethernet Adapter (l-lan)
You have to go on your VIO servers and check their, if you have SEA (shared ethernet adapter) or some other type of configuration. With SEA Fail-over configuration, you have to check which VIO server is active:
vio1$ lsdev -type sea
name status description
ent20 Available Shared Ethernet Adapter
ent21 Available Shared Ethernet Adapter
ent22 Available Shared Ethernet Adapter
ent23 Available Shared Ethernet Adapter
vio1$ entstat -all entSEA | grep Active
After you've checked everything, you can rely that there are no known bugs in your AIX and VIO versions, which could prevent switching from one VIO to another, but you can also switch everything manual - change priorities for every disk, or switch off devices (make them "defined") which use vio1 resources, and then reboot vio1. After it comes up, check everything again. If you changed priorities or removed some devices, restore the configuration before rebooting vio2.
I hope I didn't forget something very critical.
A very concise description by agent.kgb, thank you for that!
@thread-o/p: notice that what agent.kgb described is in fact "best practice" and even if you would not have the need to reboot the VIOS right now you should set things up like agent.kgb described it anyway!
If you find out that this is not the case it would be a perfect occasion to get some downtime URGENTLY and correct this! Otherwise these problems will come back to haunt you and most probably will raise their ugly heads in the most inconvenient moment possible.
I hope this helps.
bakunin