Disk needs replacing?

xramm · February 20, 2008, 2:44am

A mirror Raid system with two disk (probabaly raid1: Sun solaris Netra 240)
the following error occured
when I look at it with metastat :

d101: Submirror of d100
State: Needs maintenance
Invoke: metareplace d100 c1t0d0s5 <new device>

is there any advice that it indicates that disk really is broken ?

DukeNuke2 · February 20, 2008, 2:48am

have a look at "iostat -eE" and check /var/adm/messages for error reports...

sparcguy · February 20, 2008, 3:09am

just try to do metareplace on the old disk device again, sometimes could it be due to loose connectivity or the server was physically moved this may happen. If after you do a metareplace a few days later it again says requires maint then it's time to call it in.

one way to check at a glance iostat -en

if you see many hard errors and transport errors (meaning 10 or 20 or more) then forget it collect explorer & call sun straight away.

xramm · February 20, 2008, 4:39am

Hello DukeNuke2, sparcguy. I could not get any error report at /var/adm/messages or iostat output you advised;
here is the output of iostat :
is there any advise on them ?

# iostat -eE
---- errors ---

md0 0 0 0 0
md1 0 0 0 0
md2 0 0 0 0
md3 0 0 0 0
md4 0 0 0 0
md5 0 0 0 0
md6 0 0 0 0
md7 0 0 0 0
md8 0 0 0 0
md9 0 0 0 0
md10 0 0 0 0
md11 0 0 0 0
md20 0 0 0 0
md21 0 0 0 0
md22 0 0 0 0
md23 0 0 0 0
md24 0 0 0 0
md25 0 0 0 0
md26 0 0 0 0
md27 0 0 0 0
md28 0 0 0 0
md29 0 0 0 0
md30 0 0 0 0
md100 0 0 0 0
md101 0 0 0 0
md102 0 0 0 0
sd0 0 0 0 0
sd1 0 0 0 0
sd30 0 2 0 2
nfs1 0 0 0 0
sd0 Soft Errors: 0 Hard Errors: 0 Transport Errors: 0
Vendor: FUJITSU Product: MAX3073NCSUN72G Revision: 1503 Serial No: 000631F03D89
Size: 73.40GB <73400057856 bytes>
Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0
Illegal Request: 0 Predictive Failure Analysis: 0
sd1 Soft Errors: 0 Hard Errors: 0 Transport Errors: 0
Vendor: FUJITSU Product: MAX3073NCSUN72G Revision: 1503 Serial No: 000631F03D8S
Size: 73.40GB <73400057856 bytes>
Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0
Illegal Request: 0 Predictive Failure Analysis: 0
sd30 Soft Errors: 0 Hard Errors: 2 Transport Errors: 0
Vendor: TSSTcorp Product: CD/DVDW TS-L532U Revision: SI02 Serial No: 08/03/05
Size: 18446744073.71GB <-1 bytes>
Media Error: 0 Device Not Ready: 2 No Device: 0 Recoverable: 0
Illegal Request: 0 Predictive Failure Analysis: 0
#
# iostat -en
---- errors ---
s/w h/w trn tot
0 0 0 0 d0
0 0 0 0 d1
0 0 0 0 d2
0 0 0 0 d3
0 0 0 0 d4
0 0 0 0 d5
0 0 0 0 d6
0 0 0 0 d7
0 0 0 0 d8
0 0 0 0 d9
0 0 0 0 d10
0 0 0 0 d11
0 0 0 0 d20
0 0 0 0 d21
0 0 0 0 d22
0 0 0 0 d23
0 0 0 0 d24
0 0 0 0 d25
0 0 0 0 d26
0 0 0 0 d27
0 0 0 0 d28
0 0 0 0 d29
0 0 0 0 d30
0 0 0 0 d100
0 0 0 0 d101
0 0 0 0 d102
0 0 0 0 c1t0d0
0 0 0 0 c1t1d0
0 2 0 2 c0t0d0
0 0 0 0 sspfs_svr:vold(pid551)
#

DukeNuke2 · February 20, 2008, 4:43am

do as already suggested... replace the disk with the same disk...

xramm · February 20, 2008, 7:06am

I couldn't see any errors ?
how/where did you see errors ? could you please explain me,
thanks in advance.

DukeNuke2 · February 20, 2008, 7:29am

i can't see errors, too. but "metastat" seems to have errors... if so, do a "dummy" replace. that means:
replace the disk with errors in "metastat" with itself!

xramm · February 20, 2008, 8:08am

do you mean:

# metareplace d100 c1t0d0s5 c1t0d0s5

in this case system gave error:
metareplace: sspfs_svr: c1t0d0s5: has appeared more than once in the specification of d101

xramm · February 20, 2008, 8:12am

I think got it now:

# metareplace -e d100 c1t0d0s5

Hitesh_Shah · March 11, 2008, 1:03am

Can metastat -e be done online . How much is the load it puts on server resources .

We had the device /d45 coming in 'needs maintenance ' status about a month ago . I c the same disk coming in maintenance today. Does this mean I need to replace the disk.

BrewDudeBob · March 11, 2008, 2:15am

Did that fix your issue?

You will usually know immediately if it fails. Sometimes it will sync up almost all the way before crapping out.

I always do a "metastat|grep %" to monitor the sync progress.

BrewDudeBob · March 11, 2008, 2:17am

also, another thing I will do is do a "metastat|grep Invoke" to see how many slices are hosed.

xramm · March 11, 2008, 3:23am

metareplace -e <mirror> <device> (metreplace enable command) worked in case of disk needs really resynchronisation. I think it meant like Sun Solstice DiskSuite said is useful when a component fails due to human error (for example, accidentally turning off a disk)..

But some of the disks needed physical replacement even though metareplace -e command cleared the errors in the first place.

Also consider the commands; iostat -eE or iostat -en result to see HW errors..

b.janardhanguru · April 14, 2008, 10:56pm

after we replaces a disk with metareplace -e option..,

here do we need to resync ??. how the new disk will update with the old data.

please advice.., thanks

xramm · April 15, 2008, 3:07am

in your case metareplace will already do a resync and this is copying files from mate disk to new replaced disk.
All you have to do be careful about that the replacement disk must be partitioned to match the disk it is replacing before running the metareplace command.

BrewDudeBob · April 15, 2008, 9:11am

If your disk is part of a two disk mirror you can lay out the partitions on the new disk with a one line command:

prtvtoc -s /dev/rdsk/good_disk_s2 | fmthard -s - /dev/rdsk/new_disk_s2

Here is an example:

# prtvtoc -s /dev/rdsk/c0t1d0s2 | fmthard -s - /dev/rdsk/c3t0d0s2
fmthard: New volume table of contents now in place.

Of course it is a good idea to do a prtvtoc -s on each drive before you pipe the prtvtoc command to the fmthard so you can see the layouts of each disk before you copy the info.