Bad disk, how to replace ?

Hello,

I see hard and transport errors on all disks under treso pool and looks like some data corruption too. I want to take backup before, I reboot and replace disk. As of now, there are no slots free on server, so one option is, to break mirror, remove second disk (I need two disks, because data is 400GB). I have two spare disks, will insert in those slots, mount and copy data.
Can somebody help me to understand, if below setup shows me that I can detach disks without disturbing data and mount ?

pool: treso
 state: DEGRADED
status: One or more devices has experienced an error resulting in data
        corruption.  Applications may be affected.
action: Restore the file in question if possible.  Otherwise restore the
        entire pool from backup.
   see: http://www.sun.com/msg/ZFS-8000-8A
 scrub: resilver completed after 0h42m with 0 errors on Thu Mar 24 12:11:13 2016
config:

        NAME        STATE     READ WRITE CKSUM
        zones2      DEGRADED    17     0     0
          raidz1    DEGRADED    17     0     0
            c1t4d0  ONLINE       0     0     0
            c1t5d0  DEGRADED    35     0     0  too many errors
            c1t6d0  ONLINE       0     0     0
            c1t8d0  FAULTED      2     0     0  too many errors

errors: 4 data errors, use '-v' for a list
#

Thanks

In current configuration, you will can do little..
Reason being your configuration (RAIDZ1), allows one disk to fail (which it did).

Other being almost failed, pool is still accessible.
When the degraded disk fails (should happen soon enough), you will lose all the data in zpool.

The course of action should be :

  1. Take a backup using zfs send / receive or copy the data.
  2. zpool offline the FAILED disk from pool.
  3. Unconfigure the offlined disk using cfgadm
  4. Insert a new working drive in the same slot, and configure it using cfgadm
  5. Issue a zpool online / replace against the replaced disk.

https://docs.oracle.com/cd/E19253-01/819-5461/gbcet/index.html

Regards
Peasant.

1 Like

I took the backup, destroyed pool, replace disks and created new pool - zones3
Now, instead of putting in raidz1, I just want to create mirror of zones3. With below configuration, if one disk fails, data will be lost. I have two new disks- c1t4d0 and c1t6d0

# zpool status zones3
  pool: zones3
 state: ONLINE
 scrub: none requested
config:

        NAME        STATE     READ WRITE CKSUM
        zones3      ONLINE       0     0     0
          c1t9d0    ONLINE       0     0     0
          c1t10d0   ONLINE       0     0     0

errors: No known data errors
#

Is it correct command to run ?

zpool zones3 mirror c1t4d0 c1t6d0

Take the following example, where i'm using files but it's the same with real devices.
This will tolerate 1 to 2 device failures.

If two devices fail from one top level vdev (mirror-N) you will lose data.

I would strongly suggest using odd number of disks and keeping one hot spare in pool.
In your configuration, get one more disk if you really love your data.

[root@gimmick ~]# ls -dl /zones/test/disk*
-rw------T   1 root     root     104857600 Jun 16 02:48 /zones/test/disk0
-rw------T   1 root     root     104857600 Jun 16 02:48 /zones/test/disk1
-rw------T   1 root     root     104857600 Jun 16 02:48 /zones/test/disk2
-rw------T   1 root     root     104857600 Jun 16 02:48 /zones/test/disk3
[root@gimmick ~]# 

[root@gimmick ~]# zpool status testpool

  pool: testpool
 state: ONLINE
  scan: none requested
config:

	NAME                 STATE     READ WRITE CKSUM
	testpool             ONLINE       0     0     0
	  /zones/test/disk1  ONLINE       0     0     0
	  /zones/test/disk0  ONLINE       0     0     0

errors: No known data errors
[root@gimmick ~]# zpool attach testpool /zones/test/disk0 /zones/test/disk2
[root@gimmick ~]# zpool attach testpool /zones/test/disk1 /zones/test/disk3
[root@gimmick ~]# zpool status testpool
  pool: testpool
 state: ONLINE
  scan: resilvered 49K in 0h0m with 0 errors on Sat Jun 16 02:48:41 2018
config:

	NAME                   STATE     READ WRITE CKSUM
	testpool               ONLINE       0     0     0
	  mirror-0             ONLINE       0     0     0
	    /zones/test/disk1  ONLINE       0     0     0
	    /zones/test/disk3  ONLINE       0     0     0
	  mirror-1             ONLINE       0     0     0
	    /zones/test/disk0  ONLINE       0     0     0
	    /zones/test/disk2  ONLINE       0     0     0

errors: No known data errors

[root@gimmick  ~]# 

Hope that helps
Regards
Peasant.

Going through your example, can I run below commands online, without interruption ?

zpool attach zones c1t9d0 c1t4d0
zpool attach zones c1t10d0 c1t6d0

Yes.

Only thing that you should notice is increased read / write until resilvering is done.

Regards
Peasant.

1 Like