mdadm unable to fail a resyncing drive?

Hi All

I have a RAID 5 array consisting of 4 drives that had a partial drive failure in one of the drives.

Rebooting shows the faulty drive as background rebuilding and mdadm /dev/ARRAYID shows three drives as in sync with the fourth drive as spare rebuilding.

However the array won't come online instead reporting itself as active, degraded, not started

I want to take the faultyesque drive out of the array as smartctl shows it as just about to fail with a reallocated sectors problem.

I've executed a stop on the array but I can't fail the faulty drive as it just reports "No such device" however mdadm -E /dev/thatdrive reports it belonging to the array and mdadm -D /dev/thatarray shows the drive in the array but with thatdrive as "spare rebuilding".

I've read assorted stuff on this for the last 7-8 hours and short of physically pulling the drive out can't see a way to get the drive out of the array, I've even done a reboot to see if that helped with no effect.

What am I missing?

Thanks in advance.

---------- Post updated at 09:13 PM ---------- Previous update was at 03:25 AM ----------

No one has any ideas?

Wow that's scary how little depth of knowledge for mdadm there is out there.

The man seems to day you can pull them all offline, but is otherwise terse, so maybe they assume if you want to limp without the bad drive, you just pull it? Maybe the facility you want is lower down, in the system device layer not the md virtual device layer.

1 Like

According to what I've read elsewhere you should be able to.....

mdadm --stop /dev/thatarray

mdadm --fail /dev/thatdrive

and then the drive should no longer show as in the array.....well that's the theory but it ain't working in practice.

Try -force ?

I'd already tried with the --force option....no go I'm afraid.

I should get the replacement disks today so will be physically pulling the drive and crossing my fingers.

You did not give us very much information so it is hard to give precise answers

What flavor and version of Linux are you on?

What is the output of:

cat /proc/mdstat
mdadm --detail

Though I always found md to be stable, the Linux world seems to have fixated on the much more flexible (and thoroughly documented) Volume Management (LVM) tools. Should you recover, consider a rebuild with LVM. There are more steps and there is a learning curve involved, but these are
Outweighed by the ability to get support.

Hi.

I thought that I had read some time ago, that LVM on top of MD was a good solution. In fact, that is what I had done the last time I installed Linux on a standalone machine (not a VM) recently.

One reference provides some of the background: RAID verses LVM - Stack Overflow

Some other sources, mainly for the procedures of installing LVM on top of MD: Setup Software Raid 1 with LVM on Linux , https://wiki.archlinux.org/index.php/Installing\_with\_Software\_RAID\_or_LVM ,

Some performance numbers are available at Linux Raid Wiki

I don't recall seeing advice to shift over to RAID via LVM as opposed to MD. One reason (for me) is that MD has the ability to do RAID10, and (so far) LVM does only RAID0 and RAID1. Also grub has not traditionally understood LVM, so the boot partition cannot be LVM.

However, I have used rsync to backup LVM partitions and it makes use of snapshots to do the work -- very nice feature so that you don't need to take down the machine to backup.

cheers, drl

Excepting LVM is more about volume management and less about resilience, LVM does not AFAIK have the tools to setup a RAID5 or RAID6 environment. Instead you'd have to build LVM on top of RAID which in my opinion leads to a new level of complexity if you need to do a recovery.

LVM is good for environments where you might have lots of data spread over lots of disks, however where you need your data to be as safe as practicable with reasonably efficient usage of drives RAID5 or RAID6 is probably the way to go.

Anyways I managed to get the RAID array back up, copied the data off it, got another drive to allow me to make the array RAID6 and re created a new RAID6 array. I'm currently copying 9TB back onto the array.

Disks were so much faster when they were 28" in diameter and held 100M ! :smiley: