Software RAID

Hello,

My company has inherited a Centos based machine that has 7 hard drives and a software based raid system. Supposedly one of the drives has failed. I need to replace the hardrive.

How can I about telling which hard drive needs replacing? I have looked in the logs and there clearly is a problem there is no way I can tell which physical drive seems to have a problem. Advice?

Linux software RAID is usually managed through the mdadm tool. To see the status of the drives enter (replace /dev/md0 with the path to your actual RAID)

mdadm --detail /dev/md0

How do I find out the path to the actual raid?

Al I see is this:

[root@drill proc]# df
Filesystem           1K-blocks      Used Available Use% Mounted on
/dev/mapper/VolGroup00-LogVol00
                      16790968   2805884  13118372  18% /
/dev/hda1               101086     17720     78147  19% /boot
tmpfs                   777728         0    777728   0% /dev/shm
/dev/md0             475751440 389860552  61724084  87% /ha0

When I execute your command I get:

[root@drill proc]# mdadm --detail /dev/ha0
mdadm: cannot open /dev/ha0: No such file or directory
[root@drill proc]# mdadm --detail /dev/md0
/dev/md0:
        Version : 00.90.03
  Creation Time : Thu Mar 26 19:20:00 2009
     Raid Level : raid1
     Array Size : 483336128 (460.95 GiB 494.94 GB)
  Used Dev Size : 483336128 (460.95 GiB 494.94 GB)
   Raid Devices : 2
  Total Devices : 2
Preferred Minor : 0
    Persistence : Superblock is persistent

    Update Time : Sun Dec  6 04:07:03 2009
          State : clean
 Active Devices : 2
Working Devices : 2
 Failed Devices : 0
  Spare Devices : 0

           UUID : 4a9f08c9:339514a7:b9ef28ff:12bbcfdb
         Events : 0.6

    Number   Major   Minor   RaidDevice State
       0     253        2        0      active sync   /dev/VolGroup_a/Logical_a
       1     253        3        1      active sync   /dev/VolGroup_b/Logical_b

In your case, /dev/md0 is the software RAID device. It's a RAID1 without any errors, using 2 devices, neither of them degraded or with errors. But none of the 2 is a physical device, but they are logical devices inside LVM volume groups. Which physical devices belong to it you'll see by checking the output of vgdisplay.

From my point of view, it's been set up exactly the wrong way around. If a device fails, you'll have to rescue the volume group and logical devices before you'll be able to rescue the RAID. Also, any data not in the RAID will probably be lost, or at least will have to be restored from backup.

Usually, you create a RAID (in hard- or software), on top of which you create logical devices. That way, if a drive fails it's easy to replace, and the LVM won't even notice a part of it went missing for a while.

I do not see which physical groups they belong to after I run vgdisplay:

  --- Volume group ---
  VG Name               VolGroup00
  System ID             
  Format                lvm2
  Metadata Areas        1
  Metadata Sequence No  3
  VG Access             read/write
  VG Status             resizable
  MAX LV                0
  Cur LV                2
  Open LV               2
  Max PV                0
  Cur PV                1
  Act PV                1
  VG Size               18.53 GB
  PE Size               32.00 MB
  Total PE              593
  Alloc PE / Size       593 / 18.53 GB
  Free  PE / Size       0 / 0   
  VG UUID               w7Ta2Z-Dx90-jg8v-28ZS-XT2x-zal6-Z7q3r1
   
  --- Volume group ---
  VG Name               VolGroup_a
  System ID             
  Format                lvm2
  Metadata Areas        3
  Metadata Sequence No  6
  VG Access             read/write
  VG Status             resizable
  MAX LV                256
  Cur LV                1
  Open LV               1
  Max PV                256
  Cur PV                3
  Act PV                3
  VG Size               460.95 GB
  PE Size               4.00 MB
  Total PE              118004
  Alloc PE / Size       118004 / 460.95 GB
  Free  PE / Size       0 / 0   
  VG UUID               ql9Zv1-wqXJ-4lPb-mmsM-lyaf-Bj10-rJGU1s
   
  --- Volume group ---
  VG Name               VolGroup_b
  System ID             
  Format                lvm2
  Metadata Areas        3
  Metadata Sequence No  6
  VG Access             read/write
  VG Status             resizable
  MAX LV                256
  Cur LV                1
  Open LV               1
  Max PV                256
  Cur PV                3
  Act PV                3
  VG Size               460.95 GB
  PE Size               4.00 MB
  Total PE              118002
  Alloc PE / Size       118002 / 460.95 GB
  Free  PE / Size       0 / 0   
  VG UUID               RcIQpm-krvC-i3LD-eDQs-00Zy-mHoj-KChH9D

You can get more details about a volume group using

vgdisplay VolGroup_a
vgdisplay VolGroup_b