How to identify virtual/physic disk path on X4100?

Hello,

Currently we have a Oracle X4100 ( Solaris 10 ) server with disk failure. Originally our hardware team thought the disk can be hot-swap, when they stood in front of the server, none of the disk failure LED were on, so now we have no idea which disk is the bad one for replacement.

As we remember that although OS is seeing c2t2d0 is the bad disk originally, but it doesn't mean it's the physical disk which is failing.

Therefore, we need some advice how to identify the real failed disk for replacement.

( Note: originally only c2t2d0 shows hardware error in iostat, now is including c2t3d0, "format" is completely hung, "raidctl -l" will show 0.2.0 and hung ... without continuing )

>iostat -Een
 ---- errors ---
  s/w h/w trn tot device
  2   0   0   2 c1t0d0
  0  58  67 125 c2t2d0
  0  15 576 591 c2t3d0
  0   0   0   0 stmkx007:vold(pid589)
  0   0   0   0 atlantic:/home/perfman
  0   0   0   0 atlantic:/u5/sentinel
c1t0d0           Soft Errors: 2 Hard Errors: 0 Transport Errors: 0
Vendor: TSSTcorp Product: CD/DVDW TS-T632A Revision: SR03 Serial No:
Size: 0.00GB <0 bytes>
Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0
Illegal Request: 2 Predictive Failure Analysis: 0
c2t2d0           Soft Errors: 0 Hard Errors: 58 Transport Errors: 67
Vendor: SEAGATE  Product: ST914602SSUN146G Revision: 0603 Serial No: 0726928LR9
Size: 146.81GB <146810535936 bytes>
Media Error: 0 Device Not Ready: 0 No Device: 58 Recoverable: 0
Illegal Request: 0 Predictive Failure Analysis: 0
c2t3d0           Soft Errors: 0 Hard Errors: 15 Transport Errors: 576
Vendor: SEAGATE  Product: ST914602SSUN146G Revision: 0603 Serial No: 072892K1LC
Size: 146.81GB <146810535936 bytes>
Media Error: 0 Device Not Ready: 0 No Device: 15 Recoverable: 0
Illegal Request: 0 Predictive Failure Analysis: 0

>cfgadm -avl
Ap_Id                          Receptacle   Occupant     Condition  Information
When         Type         Busy     Phys_Id
c2                             connected    configured   unknown
unavailable  scsi-bus     n        /devices/pci@7b,0/pci1022,7458@11/pci1000,3060@2:scsi
c2::dsk/c2t2d0                 connected    configured   unknown    SEAGATE ST914602SSUN146G
unavailable  disk         n        /devices/pci@7b,0/pci1022,7458@11/pci1000,3060@2:scsi::dsk/c2t2d0
c2::dsk/c2t3d0                 connected    configured   unknown    SEAGATE ST914602SSUN146G
unavailable  disk         n        /devices/pci@7b,0/pci1022,7458@11/pci1000,3060@2:scsi::dsk/c2t3d0

 
>raidctl -l -g 0.2.0 2
Disk    Vendor  Product         Firmware        Capacity        Status  HSP
----------------------------------------------------------------------------
0.2.0   SEAGATE ST914602SSUN146 0603            136.7G          GOOD    N/A
GUID:5000c500061569d3

 root /
>raidctl -l -g 0.3.0 2
^C^C^C^C

Since "raidctl -l -g 0.3.0 2" hungs here, does it mean we actually have a bad disk at c2t3d0?

Anyway to confirm which disk is the bad disk?

Thank you very much,

SC

As you're experiencing errors across both disks now, are you sure it isn't a controller issue? Presuming the server is still under support you should get Oracle to look at it ASAP.

Run prtdiag too - this usually lets you see the status of LEDs without having to physically inspect the server.

The fact that raidctl is hanging leads me to believe this could be controller related.

1 Like

Hello,

The problem has been just resolved, but not as what we expected the way the problem should be resolved.

What we did was that we have found a similar structure server as spare, so our original idea was to shut down this spare server, boot to BIOS, check it's disk mirroring setting, pull one disk out, boot up to see the error message, then which might help us to determine which disk was the real bad one.

While local HW team told me they only saw 2 disks in X4100, I started to suspect there was no HW mirroring set-up, so we ran format->analyze->read on spare server, I intentionally picked c2t3d0 ( the one was bad on prod server ), after HW team confirmed they saw the blinking light while doing analyze, we decided to replace the disk on the problem server in the same slot.

After disk was hot-swap, "format", "prtvtoc /dev/rdsk/c2t3d0s2", "raidctl -l" all worked, then my suspicion raised : why I was seeing 2 disks in format? If it's HW mirroring, I should only see 1 disk in "format" ( since it's virtual to OS ).

Conclusion: the X4100 has 2 disks only and never been configured with HW mirroring.