Solaris 8 disk/mirroring issue

Hello!

I recently inherited system administration duties for a SUN v880 box. The system has 6 physical hard disks.. In doing some basic maintenance, I found they're configured for mirroring. I ran the metastat and metadb commands, and many of the mirrors are showing they are in need of maintenance -> See the attachments.

Question is -- How do get all the mirrors back online and functional? I was considering running the metareplace utility to enable the metabases, but it seems as if one of the hard disks is no longer being recongized by the system. (c1t3d0 -> see disks.txt) I've checked the /var/adm/message files but I don't see anything that would indicate the disk has failed, and the front status panel of the server itself (Including the error condition lights for each drive) is showing no problems... I was thinking of running touch /reconfigure, seeing the disk is redetected, if it is, then replacing the failed metabases using the metareplace and rebooting. Do you guys feel this will fix the problem?

Any advice would be greatly appreciated!

M p unknown unknown /dev/dsk/c1t1d0s4 << bad disk
M p unknown unknown /dev/dsk/c1t3d0s4 << bad disk

Note in your metadb - looks like someone added to the metadb but never rebooted - that's usually why you get the unknown in the status although it's possible you are also getting it because the disks are bad. The good thing is that there are so many copies of the metadb - so you can replace the bad disks without no worry about losing anything.

As far as the bad disk, get the replacement disk drives and you can use the metareplace command. Or you can use metadetach and metaclear to remove them from the metastat and replace the drives, format, and readd them. You can hot swap the drives so you should not need any down time.

Also, insure that the boot device is c1t0d0s0 and not c1t1d0s0. You don't want to hot swap or remove the drive you are booted off of (inital boot will be off one drive and then the mirrors come into play). Just check the output of eeprom and look for boot-device - hopefully it isn't simply "disk:a" but gives you the output that you would be able to tell which drive it really is. If not, then you can play it safe and reboot into single user to do the drive replacements. More info can be found in this thread

The boot disk looks like it refers to the c1t0d0 disk according to eeprom. It doesn't indicate the slice.

I think that error occured because of an incorrect powerdown... But I'm not sure because I just got a hold of this box. As I said the drive light indicators on the outside of the box aren't indicating any faults. On the negative side, format does not list c1t3d0, but it does list c1t1d0 as well as all other drives.

Is there any way to tell if the disk is definetely bad? The box was recently moved, so I am concerned the drives may have been rattled around a bit, maybe even come unseated. I was thinking of powering down, reseating all the drives, then booting and running touch /reconfigure. If the drive was previously detected in the system would I even need to do that for it to come back online?

Thanks!

Re-seating the 'bad' drives can be done without powering down - they are hot-swappable so it would be like removing the old and putting in the new (just that it's the same drive). So try that first with c1t3 - if it spins up the system may see it. If not, you haven't lost anything.

You can determine the slice by looking at /etc/vfstab and finding the md device for / partiton. Then look at your metastat output for that device.

And the only way to tell if the drive is bad is to hit it - ls -Rla from top of the partition should create some errors at some point - or go into format and run an analyze (read, refresh, or test - the ones that do not harm data).

Double check that syslogd is running and configured to pop warning messages into your /var/adm/messages file (or what ever you put it in).

I tried removing and reseating the drive, but no go. The box still isn't seeing it... I ran an iostat -En and found c1t1 has a ton of hard errors and c1t3 is not listed... So I've requested two replacement drives from Sun.

Once I get them, I can hot swap them for a the new drives, right? Is there a command a need to run before yanking out the drive and putting in the new one? C1t1 seems to be a mirror of the slices of the system disk. If so I can then just run these command to setup slices automatically, right?

# prtvtoc /dev/rdsk/c1t0d0s2 > /tmp/format.out
# fmthard -s /tmp/format.out /dev/rdsk/c1t1d0s2

# prtvtoc /dev/rdsk/c1t2d0s2 > /tmp/format.out
# fmthard -s /tmp/format.out /dev/rdsk/c1t3d0s2

Then once that's done, I enable the metabases on the new disks as metastat indicates:

# /usr/opt/SUNWmd/metareplace -e d60 c1t3d0s6
# /usr/opt/SUNWmd/metareplace -e d60 c1t1d0s0
# /usr/opt/SUNWmd/metareplace -e d60 c1t1d0s1
# /usr/opt/SUNWmd/metareplace -e d60 c1t1d0s3
# /usr/opt/SUNWmd/metareplace -e d60 c1t1d0s6

The mirrors will start resyncing once metareplace is invoked, right?

Thanks!

That should work as far as I see.