How do I replace a "good" RAID 1+0 disk?

Twirlip · February 28, 2013, 11:10am

Hi,

I have a Solaris Volume Manager (aka Disksuite) RAID 1+0 device consisting of 12 devices. One of these is failing (it has logged several mechanical positioning errors), and I have a replacement disk.

Normally, when a disk fails, volume manager marks it as failed, and replacing it is fairly easy. I would just unconfigure the disk (cfgadm -c unconfigure), replace it, reconfigure the disk, run devfsadm, partition the disk, and then use metareplace to replace it in volume manager.

However, in this case the disk has not actually failed, and is still being written to. How do I tell volume manager to stop using the disk? The only commands I know (metadetatch and metaoffline) will disable the whole d91 submirror (AFAIK), not just this device.

Here is the metadevice in question:

# metastat d90
d90: Mirror
    Submirror 0: d91
      State: Okay
    Submirror 1: d92
      State: Okay
    Pass: 1
    Read option: roundrobin (default)
    Write option: parallel (default)
    Size: 426673521 blocks (203 GB)

d91: Submirror of d90
    State: Okay
    Size: 426673521 blocks (203 GB)
    Stripe 0: (interlace: 32 blocks)
        Device      Start Block  Dbase        State Reloc Hot Spare
        c4t8d0s0           0     No            Okay   Yes <==== This disk is failing
        c4t9d0s0        2889     No            Okay   Yes
        c4t10d0s0       2889     No            Okay   Yes
        c4t11d0s0       2889     No            Okay   Yes
        c4t12d0s0       2889     No            Okay   Yes
        c4t13d0s0       2889     No            Okay   Yes

d92: Submirror of d90
    State: Okay
    Size: 426673521 blocks (203 GB)
    Stripe 0: (interlace: 32 blocks)
        Device      Start Block  Dbase        State Reloc Hot Spare
        c5t8d0s0           0     No            Okay   Yes
        c5t9d0s0        2889     No            Okay   Yes
        c5t10d0s0       2889     No            Okay   Yes
        c5t11d0s0       2889     No            Okay   Yes
        c5t12d0s0       2889     No            Okay   Yes
        c5t13d0s0       2889     No            Okay   Yes

So, how do I replace c4t8d0s0, but still continue using all the other disks during the replacement? I want to use the same slot, so I have to pull the old disk out first.

(Note: Solaris Volume Manager makes this look like a RAID 0+1, but my understanding is this is really a RAID 1+0 as explained here)

MadeInGermany · March 1, 2013, 5:24am

Good point - the meta device driver must know that the disk has failed.
The safest method is to simply pull out the physical disk,
and read from that metadevice until metastat shows that the device has failed.
Then insert the new physical disk, and run

metareplace -e mirror device

in your case

metareplace -e d90 c4t8d0s0

---------- Post updated at 05:24 AM ---------- Previous update was at 05:22 AM ----------

Please first check with

metadb

If the failed device has got a metadb, it must be recreated before a metareplace .