RAID 10 Failed Drive Swap

I am new to the AIX operating system and am seeking out some advice. We recently have had a drive go bad on our AIX server that is in a RAID 10 array. We have a replacement on the way. I was wondering what the correct steps are to swap out this drive. Does the server need to be powered off? Or can I hot swap?

I found some instruction through the diag command were I can perform a hot plug task and remove the drive from the array.

Attached is a screenshot from the "diag" command with Disk Array Configuration being shown. pdisk6 is the affected disk.

First off, welcome to the AIX board.

Having said this, it might help to describe your hardware a bit more in detail. The more detail you give the better the offered solutions will be.

In general (but this will depend on your hardware, so take this cum grano salis) it will not be necessary to power off or even unmount filesystems involved. AIX' LVM and IBMs RAID driver can handle practically all the necessary tasks while the storage is in use. I wouldn't start the biggest database import available while recovering from disk failures but that's about it.

I have not used RAID arrays for probably 10 years now, so i can only draw on some remote memory, but IBMs arrays always included hot-standby-disks. A failed disk is immediately swapped with the standby and you take the former out and bring in a new standby disk in when recovering the array.

I hope this helps.

bakunin

1 Like

Thank you for the welcome and yes your post does shed light on what I was thinking along the lines of with regards to the hot-swap.

I am out of office and forgot to grab the exact model number but the system is an older IBM TotalStorage unit. Attached is a picture I have found that looks somewhat like the unit we have in place minus the model number.

I do appreciate the help and response and I am sorry for the limited information I have; I am new to the field but that is no excuse.

Attached also a low quality photo I took awhile ago I found on my phone. The arrow is pointing to the affected SCSI drive. The drives are all IBM Ultra 320 36GB at 10K RPM.

Just going to follow up with what I did in hopes it helps someone else who has this issue arise.

Use diag command to check array and find failed disk(s).

# diag
---> Task Selection
---> RAID Array Manager
---> PCI-X SCSI Disk Array Manager
---> List PCI-X SCSI Disk Array Configuration
---> sisioa1 Available 06-08 PCI-X Dual Channel U320 SCSI RAID

Activate the LED indicator of the physical disk to locate it on the rack.

# diag
---> Task Selection
---> Hot Plug Task
---> SCSI and SCSI RAID Hot Plug Manager
---> Replace/Remove a Device Attached to an SCSI Hot Swap Enclosure
---> select failed disk here(pdisk#)

A message will appear in regards to an LED and Remove state. Find the physical drive that is now flashing amber from its LED and remove it from the array. After you remove the failed physical drive, replace it with the new unit.
Hit Enter on that message screen to remove that slot from the "remove state".

# diag
---> Task Selection
---> Hot Plug Task
---> SCSI and SCSI RAID Hot Plug Manager
---> Configure Added/Replaced Devices

# diag
---> Task Selection
---> Log Repair Action (Select affected disk)

Rebuild the array

# diag
---> Task Selection
---> RAID Array Manager
---> PCI-X SCSI Disk Array Manager
---> Reconstruct a PCI-X SCSI Disk Array

2 Likes

Thanks for your contribution, appreciated

1 Like

Thank you for sharing the solution. This is the spirit!

bakunin

1 Like