Hello, I'm trying to get to the bottom of SAN disk errors we've been seeing.
Server is Sun Fire X4270 M2 running Solaris 10 8/11 u10 X86 since April 2012. SAN HBAs are SG-PCIE2FC-QF8-Z-Sun-branded Qlogic. SAN storage system is Hitachi VSP. We have 32 LUNs in use and another 8 LUNs not brought into Symantec Storage Foundation yet.
We started seeing hardware and transport errors on the LUNs July 2 which lead to corruption of 3 Veritas filesystems. I got that resolved on the third and we had to restore from tape 3 filesystems. The SAN team found no SAN switch errors and Hitachi's analysis showed no disk errors.
We originally had Solaris MPxIO enabled by default for multipathing, along with Veritas DMP. Symantec was saying that the two multipathing systems could co-exist, but the errors returned so I disabled MPxIO and rebooted on July 17. I didn't see any more errors until yesterday at 1110am. Is this a problem with the SAN HBAs? What do these errors mean? Any help would be appreciated.
Jul 22 11:10:26 cscgbwndc004 scsi: [ID 107833 kern.warning] WARNING: /pci@0,0/pci8086,340c@5/pci1077,171@0,1/fp@0,0/disk@w50060e8006fe93bb,17 (sd75):
Jul 22 11:10:26 cscgbwndc004 SCSI transport failed: reason 'tran_err': retrying command
Jul 22 11:10:26 cscgbwndc004 scsi: [ID 107833 kern.warning] WARNING: /pci@0,0/pci8086,340c@5/pci1077,171@0,1/fp@0,0/disk@w50060e8006fe93bb,16 (sd77):
Jul 22 11:10:26 cscgbwndc004 SCSI transport failed: reason 'tran_err': retrying command
Jul 22 11:10:26 cscgbwndc004 scsi: [ID 107833 kern.warning] WARNING: /pci@0,0/pci8086,340c@5/pci1077,171@0,1/fp@0,0/disk@w50060e8006fe93bb,17 (sd75):
Jul 22 11:10:26 cscgbwndc004 Error for Command: write(10) Error Level: Retryable
Jul 22 11:10:26 cscgbwndc004 scsi: [ID 107833 kern.notice] Requested Block: 141200 Error Block: 141200
Jul 22 11:10:26 cscgbwndc004 scsi: [ID 107833 kern.notice] Vendor: HITACHI Serial Number: 50 0FE931367
Jul 22 11:10:26 cscgbwndc004 scsi: [ID 107833 kern.notice] Sense Key: Unit_Attention
Jul 22 11:10:26 cscgbwndc004 scsi: [ID 107833 kern.notice] ASC: 0x29 (power on, reset, or bus reset occurred), ASCQ: 0x0, FRU: 0x0
Jul 22 11:10:26 cscgbwndc004 scsi: [ID 107833 kern.warning] WARNING: /pci@0,0/pci8086,340c@5/pci1077,171@0,1/fp@0,0/disk@w50060e8006fe93bb,16 (sd77):
Jul 22 11:10:26 cscgbwndc004 Error for Command: write(10) Error Level: Retryable
Jul 22 11:10:26 cscgbwndc004 scsi: [ID 107833 kern.notice] Requested Block: 14223776 Error Block: 14223776
Jul 22 11:10:26 cscgbwndc004 scsi: [ID 107833 kern.notice] Vendor: HITACHI Serial Number: 50 0FE931366
Jul 22 11:10:26 cscgbwndc004 scsi: [ID 107833 kern.notice] Sense Key: Unit_Attention
Jul 22 11:10:26 cscgbwndc004 scsi: [ID 107833 kern.notice] ASC: 0x29 (power on, reset, or bus reset occurred), ASCQ: 0x0, FRU: 0x0
Jul 22 11:10:26 cscgbwndc004 scsi: [ID 107833 kern.warning] WARNING: /pci@0,0/pci8086,340c@5/pci1077,171@0,1/fp@0,0/disk@w50060e8006fe93bb,10 (sd90):
Jul 22 11:10:26 cscgbwndc004 Error for Command: write(10) Error Level: Retryable
Jul 22 11:10:26 cscgbwndc004 scsi: [ID 107833 kern.notice] Requested Block: 12622176 Error Block: 12622176
Jul 22 11:10:26 cscgbwndc004 scsi: [ID 107833 kern.notice] Vendor: HITACHI Serial Number: 50 0FE931360
Jul 22 11:10:26 cscgbwndc004 scsi: [ID 107833 kern.notice] Sense Key: Unit_Attention
Jul 22 11:10:26 cscgbwndc004 scsi: [ID 107833 kern.notice] ASC: 0x29 (power on, reset, or bus reset occurred), ASCQ: 0x0, FRU: 0x0
Jul 22 11:10:29 cscgbwndc004 scsi: [ID 107833 kern.warning] WARNING: /pci@0,0/pci8086,340c@5/pci1077,171@0,1/fp@0,0/disk@w50060e8006fe93bb,1c (sd70):
Jul 22 11:10:29 cscgbwndc004 Error for Command: read(10) Error Level: Retryable
Jul 22 11:10:29 cscgbwndc004 scsi: [ID 107833 kern.notice] Requested Block: 13265440 Error Block: 13265440
Jul 22 11:10:29 cscgbwndc004 scsi: [ID 107833 kern.notice] Vendor: HITACHI Serial Number: 50 0FE93136C
Jul 22 11:10:29 cscgbwndc004 scsi: [ID 107833 kern.notice] Sense Key: Unit_Attention
Jul 22 11:10:29 cscgbwndc004 scsi: [ID 107833 kern.notice] ASC: 0x29 (power on, reset, or bus reset occurred), ASCQ: 0x0, FRU: 0x0
Jul 22 11:10:29 cscgbwndc004 scsi: [ID 107833 kern.warning] WARNING: /pci@0,0/pci8086,340c@5/pci1077,171@0,1/fp@0,0/disk@w50060e8006fe93bb,18 (sd74):
Jul 22 11:10:29 cscgbwndc004 Error for Command: read(10) Error Level: Retryable
Jul 22 11:10:29 cscgbwndc004 scsi: [ID 107833 kern.notice] Requested Block: 13264016 Error Block: 13264016
Jul 22 11:10:29 cscgbwndc004 scsi: [ID 107833 kern.notice] Vendor: HITACHI Serial Number: 50 0FE931368
Jul 22 11:10:29 cscgbwndc004 scsi: [ID 107833 kern.notice] Sense Key: Unit_Attention
Jul 22 11:10:29 cscgbwndc004 scsi: [ID 107833 kern.notice] ASC: 0x29 (power on, reset, or bus reset occurred), ASCQ: 0x0, FRU: 0x0
iostat -en | egrep "device|errors|c3"
---- errors ---
s/w h/w trn tot device
0 1 0 1 c3t50060E8006FE93BBd39
0 1 0 1 c3t50060E8006FE93BBd38
0 1 0 1 c3t50060E8006FE93BBd37
0 1 0 1 c3t50060E8006FE93BBd36
0 1 0 1 c3t50060E8006FE93BBd35
0 1 0 1 c3t50060E8006FE93BBd34
0 1 0 1 c3t50060E8006FE93BBd33
0 1 0 1 c3t50060E8006FE93BBd32
0 1 0 1 c3t50060E8006FE93BBd31
0 1 0 1 c3t50060E8006FE93BBd30
0 1 0 1 c3t50060E8006FE93BBd29
0 1 0 1 c3t50060E8006FE93BBd28
0 1 0 1 c3t50060E8006FE93BBd27
0 1 0 1 c3t50060E8006FE93BBd26
0 1 0 1 c3t50060E8006FE93BBd25
0 1 0 1 c3t50060E8006FE93BBd24
0 2 1 3 c3t50060E8006FE93BBd23
0 2 1 3 c3t50060E8006FE93BBd22
0 1 0 1 c3t50060E8006FE93BBd21
0 1 0 1 c3t50060E8006FE93BBd20
0 1 0 1 c3t50060E8006FE93BBd19
0 1 0 1 c3t50060E8006FE93BBd18
0 1 0 1 c3t50060E8006FE93BBd17
0 1 0 1 c3t50060E8006FE93BBd16
0 1 0 1 c3t50060E8006FE93BBd15
0 0 0 0 c3t50060E8006FE93BBd14
0 0 0 0 c3t50060E8006FE93BBd13
0 0 0 0 c3t50060E8006FE93BBd12
0 0 0 0 c3t50060E8006FE93BBd11
0 0 0 0 c3t50060E8006FE93BBd10
0 0 0 0 c3t50060E8006FE93BBd9
0 0 0 0 c3t50060E8006FE93BBd8
0 0 0 0 c3t50060E8006FE93BBd7
0 0 0 0 c3t50060E8006FE93BBd6
0 0 0 0 c3t50060E8006FE93BBd5
0 0 0 0 c3t50060E8006FE93BBd4
0 0 0 0 c3t50060E8006FE93BBd3
0 0 0 0 c3t50060E8006FE93BBd2
0 0 0 0 c3t50060E8006FE93BBd1
0 0 0 0 c3t50060E8006FE93BBd0
Disk instance (sd) names to device names (cXtXdX).
Excluding md|st|nfs and including c3|c4 - SAN controller/paths:
sd43=/dev/dsk/c4t50060E8006FE93ABd39
sd44=/dev/dsk/c4t50060E8006FE93ABd38
sd45=/dev/dsk/c4t50060E8006FE93ABd37
sd46=/dev/dsk/c4t50060E8006FE93ABd36
sd47=/dev/dsk/c4t50060E8006FE93ABd35
sd48=/dev/dsk/c4t50060E8006FE93ABd34
sd49=/dev/dsk/c4t50060E8006FE93ABd33
sd50=/dev/dsk/c4t50060E8006FE93ABd32
sd51=/dev/dsk/c4t50060E8006FE93ABd31
sd52=/dev/dsk/c3t50060E8006FE93BBd39
sd53=/dev/dsk/c3t50060E8006FE93BBd38
sd54=/dev/dsk/c4t50060E8006FE93ABd30
sd55=/dev/dsk/c3t50060E8006FE93BBd37
sd56=/dev/dsk/c3t50060E8006FE93BBd36
sd57=/dev/dsk/c4t50060E8006FE93ABd29
sd58=/dev/dsk/c3t50060E8006FE93BBd35
sd59=/dev/dsk/c4t50060E8006FE93ABd28
sd60=/dev/dsk/c3t50060E8006FE93BBd34
sd61=/dev/dsk/c3t50060E8006FE93BBd33
sd62=/dev/dsk/c4t50060E8006FE93ABd27
sd63=/dev/dsk/c3t50060E8006FE93BBd32
sd64=/dev/dsk/c3t50060E8006FE93BBd31
sd65=/dev/dsk/c4t50060E8006FE93ABd26
sd66=/dev/dsk/c3t50060E8006FE93BBd30
sd67=/dev/dsk/c4t50060E8006FE93ABd25
sd68=/dev/dsk/c3t50060E8006FE93BBd29
sd69=/dev/dsk/c4t50060E8006FE93ABd24
sd70=/dev/dsk/c3t50060E8006FE93BBd28
sd71=/dev/dsk/c3t50060E8006FE93BBd27
sd72=/dev/dsk/c3t50060E8006FE93BBd26
sd73=/dev/dsk/c3t50060E8006FE93BBd25
sd74=/dev/dsk/c3t50060E8006FE93BBd24
sd75=/dev/dsk/c3t50060E8006FE93BBd23
sd76=/dev/dsk/c4t50060E8006FE93ABd23
sd77=/dev/dsk/c3t50060E8006FE93BBd22
sd78=/dev/dsk/c4t50060E8006FE93ABd22
sd79=/dev/dsk/c4t50060E8006FE93ABd21
sd80=/dev/dsk/c3t50060E8006FE93BBd21
sd81=/dev/dsk/c4t50060E8006FE93ABd20
sd82=/dev/dsk/c3t50060E8006FE93BBd20
sd83=/dev/dsk/c4t50060E8006FE93ABd19
sd84=/dev/dsk/c3t50060E8006FE93BBd19
sd85=/dev/dsk/c3t50060E8006FE93BBd18
sd86=/dev/dsk/c4t50060E8006FE93ABd18
sd87=/dev/dsk/c4t50060E8006FE93ABd17
sd88=/dev/dsk/c3t50060E8006FE93BBd17
sd89=/dev/dsk/c4t50060E8006FE93ABd16
sd90=/dev/dsk/c3t50060E8006FE93BBd16
sd91=/dev/dsk/c3t50060E8006FE93BBd15
sd92=/dev/dsk/c4t50060E8006FE93ABd15
sd93=/dev/dsk/c3t50060E8006FE93BBd14
sd94=/dev/dsk/c4t50060E8006FE93ABd14
sd95=/dev/dsk/c3t50060E8006FE93BBd13
sd96=/dev/dsk/c4t50060E8006FE93ABd13
sd97=/dev/dsk/c4t50060E8006FE93ABd12
sd98=/dev/dsk/c3t50060E8006FE93BBd12
sd99=/dev/dsk/c4t50060E8006FE93ABd11
sd100=/dev/dsk/c3t50060E8006FE93BBd11
sd101=/dev/dsk/c3t50060E8006FE93BBd10
sd102=/dev/dsk/c4t50060E8006FE93ABd10
sd103=/dev/dsk/c3t50060E8006FE93BBd9
sd104=/dev/dsk/c4t50060E8006FE93ABd9
sd105=/dev/dsk/c3t50060E8006FE93BBd8
sd106=/dev/dsk/c4t50060E8006FE93ABd8
sd107=/dev/dsk/c3t50060E8006FE93BBd7
sd108=/dev/dsk/c4t50060E8006FE93ABd7
sd109=/dev/dsk/c3t50060E8006FE93BBd6
sd110=/dev/dsk/c4t50060E8006FE93ABd6
sd111=/dev/dsk/c3t50060E8006FE93BBd5
sd112=/dev/dsk/c4t50060E8006FE93ABd5
sd113=/dev/dsk/c3t50060E8006FE93BBd4
sd114=/dev/dsk/c4t50060E8006FE93ABd4
sd115=/dev/dsk/c4t50060E8006FE93ABd3
sd116=/dev/dsk/c3t50060E8006FE93BBd3
sd117=/dev/dsk/c4t50060E8006FE93ABd2
sd118=/dev/dsk/c3t50060E8006FE93BBd2
sd119=/dev/dsk/c4t50060E8006FE93ABd1
sd120=/dev/dsk/c3t50060E8006FE93BBd1
sd121=/dev/dsk/c3t50060E8006FE93BBd0
sd122=/dev/dsk/c4t50060E8006FE93ABd0