Maint, resyncing and last-erred notifications

Hi fellow members!

I have a oracle solaris server with two internal disks, that acts as an authentication server only, and for now the server seems to be doing its job, but when typing metastat -c I get some notifications.
I am not familiar with SVM , I wonder if someone can help me on this:

metastat -c
d50              m  8.0GB d51 d52
    d51          s  8.0GB c0t0d0s6
    d52          s  8.0GB c0t1d0s6
d30              m   89GB d31 (maint) d32 (maint)
    d31          s   89GB c0t0d0s5 (resyncing)
    d32          s   89GB c0t1d0s5 (last-erred)
d20              m   16GB d21 d22
    d21          s   16GB c0t0d0s1
    d22          s   16GB c0t1d0s1
d10              m   15GB d11
    d11          s   15GB c0t0d0s0
d60              m  8.0GB d61 d62
    d61          s  8.0GB c0t0d0s7
    d62          s  8.0GB c0t1d0s7
d12              s   15GB c0t1d0s0

and the output of iostat -E shows me:

 iostat -E
sd2       Soft Errors: 0 Hard Errors: 0 Transport Errors: 0
Vendor: FUJITSU  Product: MBD2147RC        Revision: 3702 Serial No:
Size: 146.81GB <146810536448 bytes>
Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0
Illegal Request: 0 Predictive Failure Analysis: 0
sd3       Soft Errors: 26 Hard Errors: 28 Transport Errors: 8
Vendor: FUJITSU  Product: MBD2147RC        Revision: 3702 Serial No:
Size: 146.81GB <146810536448 bytes>
Media Error: 24 Device Not Ready: 0 No Device: 4 Recoverable: 26
Illegal Request: 0 Predictive Failure Analysis: 4
sd4       Soft Errors: 0 Hard Errors: 0 Transport Errors: 0
Vendor: TEAC     Product: DV-W28S-V        Revision: J.0B Serial No:
Size: 0.00GB <0 bytes>
Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0
Illegal Request: 2 Predictive Failure Analysis: 0

The logs show media errors:

Feb  9 16:23:39 maphmi  Error for Command: read(10)                Error Level: Retryable
Feb  9 16:23:39 maphmi scsi: [ID 107833 kern.notice]    Requested Block: 72021888                  Error Block: 72021888
Feb  9 16:23:39 maphmi scsi: [ID 107833 kern.notice]    Vendor: FUJITSU                            Serial Number: D0C5PA800SFH
Feb  9 16:23:39 maphmi scsi: [ID 107833 kern.notice]    Sense Key: Unit_Attention
Feb  9 16:23:39 maphmi scsi: [ID 107833 kern.notice]    ASC: 0x29 (scsi bus reset occurred), ASCQ: 0x2, FRU: 0x0
Feb  9 16:23:39 maphmi scsi: [ID 107833 kern.warning] WARNING: /pci@0,600000/pci@0/pci@0/scsi@0/sd@1,0 (sd3):
Feb  9 16:23:39 maphmi  Error for Command: write(10)               Error Level: Informational
Feb  9 16:23:39 maphmi scsi: [ID 107833 kern.notice]    Requested Block: 65024710                  Error Block: 65024710
Feb  9 16:23:39 maphmi scsi: [ID 107833 kern.notice]    Vendor: FUJITSU                            Serial Number: D0C5PA800SFH
Feb  9 16:23:39 maphmi scsi: [ID 107833 kern.notice]    Sense Key: Soft_Error
Feb  9 16:23:39 maphmi scsi: [ID 107833 kern.notice]    ASC: 0x5d (firmware impending failure too many block reassigns), ASCQ: 0x64, FRU: 0x0
Feb  9 16:23:42 maphmi scsi: [ID 107833 kern.warning] WARNING: /pci@0,600000/pci@0/pci@0/scsi@0/sd@1,0 (sd3):
Feb  9 16:23:42 maphmi  Error for Command: read(10)                Error Level: Retryable
Feb  9 16:23:42 maphmi scsi: [ID 107833 kern.notice]    Requested Block: 72021888                  Error Block: 72022006
Feb  9 16:23:42 maphmi scsi: [ID 107833 kern.notice]    Vendor: FUJITSU                            Serial Number: D0C5PA800SFH
Feb  9 16:23:42 maphmi scsi: [ID 107833 kern.notice]    Sense Key: Media_Error
Feb  9 16:23:42 maphmi scsi: [ID 107833 kern.notice]    ASC: 0x11 (read retries exhausted), ASCQ: 0x1, FRU: 0x0
Feb  9 16:23:46 maphmi scsi: [ID 107833 kern.warning] WARNING: /pci@0,600000/pci@0/pci@0/scsi@0/sd@1,0 (sd3):
Feb  9 16:23:46 maphmi  Error for Command: read(10)                Error Level: Retryable
Feb  9 16:23:46 maphmi scsi: [ID 107833 kern.notice]    Requested Block: 72021888                  Error Block: 72022006
Feb  9 16:23:46 maphmi scsi: [ID 107833 kern.notice]    Vendor: FUJITSU                            Serial Number: D0C5PA800SFH
Feb  9 16:23:46 maphmi scsi: [ID 107833 kern.notice]    Sense Key: Media_Error
Feb  9 16:23:46 maphmi scsi: [ID 107833 kern.notice]    ASC: 0x11 (read retries exhausted), ASCQ: 0x1, FRU: 0x0
Feb  9 16:23:53 maphmi scsi: [ID 107833 kern.warning] WARNING: /pci@0,600000/pci@0/pci@0/scsi@0/sd@1,0 (sd3):
Feb  9 16:23:53 maphmi  Error for Command: read(10)                Error Level: Fatal
Feb  9 16:23:53 maphmi scsi: [ID 107833 kern.notice]    Requested Block: 72021888                  Error Block: 72022010
Feb  9 16:23:53 maphmi scsi: [ID 107833 kern.notice]    Vendor: FUJITSU                            Serial Number: D0C5PA800SFH
Feb  9 16:23:53 maphmi scsi: [ID 107833 kern.notice]    Sense Key: Media_Error
Feb  9 16:23:53 maphmi scsi: [ID 107833 kern.notice]    ASC: 0x11 (read retries exhausted), ASCQ: 0x1, FRU: 0x0
Feb  9 16:23:53 maphmi md_stripe: [ID 641072 kern.warning] WARNING: md: d32: read error on /dev/dsk/c0t1d0s5
Feb  9 20:10:37 maphmi dtlogin[23097]: [ID 293258 user.error] libsldap: Status: 49  Mesg: openConnection: simple bind failed - Invalid credentials
Feb  9 22:34:24 maphmi dtlogin[10415]: [ID 293258 user.error] libsldap: Status: 49  Mesg: openConnection: simple bind failed - Invalid credentials
Feb 10 00:04:10 maphmi dtlogin[15976]: [ID 293258 user.error] libsldap: Status: 49  Mesg: openConnection: simple bind failed - Invalid credentials
Feb 10 00:04:26 maphmi last message repeated 1 time
Feb 10 03:34:01 maphmi syslogd: going down on signal 15
Feb 10 12:55:57 maphmi dtlogin[2513]: [ID 293258 user.error] libsldap: Status: 49  Mesg: openConnection: simple bind failed - Invalid credentials
Feb 10 15:34:01 maphmi syslogd: going down on signal 15
Feb 10 17:08:18 maphmi dtlogin[19265]: [ID 293258 user.error] libsldap: Status: 49  Mesg: openConnection: simple bind failed - Invalid credentials
Feb 11 03:34:01 maphmi syslogd: going down on signal 15
Feb 11 07:24:50 maphmi dtlogin[27628]: [ID 293258 user.error] libsldap: Status: 49  Mesg: openConnection: simple bind failed - Invalid credentials
Feb 11 15:34:01 maphmi syslogd: going down on signal 15
Feb 12 03:34:01 maphmi syslogd: going down on signal 15
Feb 12 07:42:35 maphmi dtlogin[23240]: [ID 293258 user.error] libsldap: Status: 49  Mesg: openConnection: simple bind failed - Invalid credentia

Can you help

This disk c0t1d0 is experiencing read errors.

Currently it is only evident on slice 5 to the SVM.
iostat is showing errors for entire device.

Have a replacement disk ready.

Detach all slices from metadevices involving failed disk (c0t1d0..)
You will need to identify the disk physically and unconfigure it using cfgadm

The blue led should indicate the disk is safe to remove.

Replace of faulty disk in the server.

After the new working disk is inserted and visible to the operating system, proceed and follow the docs regarding svm replace faulty disk (or root disk) and its slices.

You might also want to considering making backup asap.

This is no easy task, and should be performed with care.
Verify each step taken and follow the documentation for the exact version of SVM and operating system you are using.

Hope that helps.
Regards
Peasant.

1 Like

thanks for your help, but I can only proceed if I have a spare/replacement disk, but for the moment I should backup all data on disks right? Would a backup be reliable when there is media or read errors?

Since other device in the mirror is ok, the answer is yes, the backup should be reliable.

Be vary tho, the d12 metadevice is not mirrored (and its slice belongs to faulty disk).
Is that swap ?

Regards
Peasant.