Fibre channel link down on booting Solaris server

Hi

I had power issue that affected a server, in which I had power ON the server SPARC T1-B3 running solaris 10 .
After power on the system stops at ok prompt, them I issued the following commands:

{0} ok setenv auto-boot? false
auto-boot? =            false
{0} ok reset-all


SPARC T3-1B, No Keyboard
Copyright (c) 1998, 2010, Oracle and/or its affiliates. All rights reserved.
OpenBoot 4.32.1, 65024 MB memory available, Serial #98354310.
Ethernet address 0:21:28:dc:c4:86, Host ID: 85dcc486.



{0} ok probe-scsi-all
/pci@400/pci@2/pci@0/pci@4/pci@0/pci@2/SUNW,qlc@0,1
QLogic QEM3572  Host Adapter FCode(SPARC): 3.06  08/19/09
ISP Firmware version 4.06.02
Fibre Channel Link down
Possible causes: No cable, incorrect connection mode or data rate
SFP state: 8Gb Present

/pci@400/pci@2/pci@0/pci@4/pci@0/pci@2/SUNW,qlc@0
QLogic QEM3572  Host Adapter FCode(SPARC): 3.06  08/19/09
ISP Firmware version 4.06.02
Fibre Channel Link down
Possible causes: No cable, incorrect connection mode or data rate
SFP state: 8Gb Present

/pci@400/pci@1/pci@0/pci@7/pci@0/usb@0,2/hub@3/storage@2
  Unit 0   Removable Read Only device    AMI     Virtual CDROM   1.00

/pci@400/pci@1/pci@0/pci@4/pci@0/pci@2/SUNW,qlc@0,1
QLogic QEM3572  Host Adapter FCode(SPARC): 3.06  08/19/09
ISP Firmware version 4.06.02
Fibre Channel Link down
Possible causes: No cable, incorrect connection mode or data rate
SFP state: 8Gb Present

/pci@400/pci@1/pci@0/pci@4/pci@0/pci@2/SUNW,qlc@0
QLogic QEM3572  Host Adapter FCode(SPARC): 3.06  08/19/09
ISP Firmware version 4.06.02
Fibre Channel Link down
Possible causes: No cable, incorrect connection mode or data rate
SFP state: 8Gb Present

/pci@400/pci@1/pci@0/pci@2/LSI,sas@0

FCode Version 1.00.54, MPT Version 2.00, Firmware Version 5.00.17.00

Target 9
  Unit 0   Disk   SEAGATE  ST930003SSUN300G 0B70
  SASDeviceName 5000c50039c1548b  SASAddress 5000c50039c15489  PhyNum 0
Target a
  Unit 0   Disk   SEAGATE  ST930003SSUN300G 0B70
  SASDeviceName 5000c50039bdd4cb  SASAddress 5000c50039bdd4c9  PhyNum 1

{0} ok

So I am not able to boot.
I found it very strange to have these errors because, all I had was a power failure to the blade, and I have managed to power on successfully the other blade on the same chassis

Power failures can do damage, for example, the mains power could spike.

However, first thing to try is to re-seat the FC cable. Unplug and replug both ends.

The thing I found strange is the FC cables there at the back of the chassis are only connected to the NetApp storage system, so why the system wont boot?, But a blade next to this one, does boot normaly..

Yes, so, repeat, what I said in post#2

Why are you not able to boot? What is the boot device? Which error message do you get when you boot?

Hi

my boot device was set to net:

ok printenv boot-device
boot-device =           disk net
{0} ok

then I use the following procedure to change the boot device:

{0} ok setenv auto-boot? false
auto-boot? =            false
{0} ok reset-all


SPARC T3-1B, No Keyboard
Copyright (c) 1998, 2010, Oracle and/or its affiliates. All rights reserved.
OpenBoot 4.32.1, 65024 MB memory available, Serial #98354310.
Ethernet address 0:21:28:dc:c4:86, Host ID: 85dcc486.



{0} ok devalias
screen                   /pci@400/pci@2/pci@0/pci@7/pci@0/display@0
mouse                    /pci@400/pci@1/pci@0/pci@7/pci@0/usb@0,1/device@2/mouse@1
rcdrom                   /pci@400/pci@1/pci@0/pci@7/pci@0/usb@0,2/hub@3/storage@2/disk@0
rkeyboard                /pci@400/pci@1/pci@0/pci@7/pci@0/usb@0,1/device@2/keyboard@0
rscreen                  /pci@400/pci@2/pci@0/pci@7/pci@0/display@0:r1280x1024x60
net3                     /niu@480/network@1
net2                     /niu@480/network@0
net1                     /pci@400/pci@2/pci@0/pci@2/network@0,1
net0                     /pci@400/pci@2/pci@0/pci@2/network@0
net                      /pci@400/pci@2/pci@0/pci@2/network@0
disk3                    /pci@400/pci@1/pci@0/pci@2/@0/disk@p3
disk2                    /pci@400/pci@1/pci@0/pci@2/@0/disk@p2
disk1                    /pci@400/pci@1/pci@0/pci@2/@0/disk@p1
disk0                    /pci@400/pci@1/pci@0/pci@2/@0/disk@p0
disk                     /pci@400/pci@1/pci@0/pci@2/@0/disk@p0
rem                      /pci@400/pci@1/pci@0/pci@2/@0
virtual-console          /virtual-devices@100/console@1
name                     aliases
{0} ok printenv boot-device
boot-device =           disk0
{0} ok setenv boot-device /pci@400/pci@1/pci@0/pci@2/@0/disk@p0
boot-device =           /pci@400/pci@1/pci@0/pci@2/@0/disk@p0
{0} ok printenv boot-device
boot-device =           /pci@400/pci@1/pci@0/pci@2/@0/disk@p0
{0} ok boot
Boot device: /pci@400/pci@1/pci@0/pci@2/@0/disk@p0  File and args:


But I am not able to boot. I �ve try other selections from the devalias command, but still not able to boot

------ Post updated at 12:32 PM ------

Hi

I have now run the following:

{0} ok probe-scsi-all
/pci@400/pci@2/pci@0/pci@4/pci@0/pci@2/SUNW,qlc@0,1
QLogic QEM3572  Host Adapter FCode(SPARC): 3.06  08/19/09
ISP Firmware version 4.06.02
Fibre Channel Link down
Possible causes: No cable, incorrect connection mode or data rate
SFP state: 8Gb Present

/pci@400/pci@2/pci@0/pci@4/pci@0/pci@2/SUNW,qlc@0
QLogic QEM3572  Host Adapter FCode(SPARC): 3.06  08/19/09
ISP Firmware version 4.06.02
Fibre Channel Link down
Possible causes: No cable, incorrect connection mode or data rate
SFP state: 8Gb Present

/pci@400/pci@1/pci@0/pci@7/pci@0/usb@0,2/hub@3/storage@2
  Unit 0   Removable Read Only device    AMI     Virtual CDROM   1.00

/pci@400/pci@1/pci@0/pci@4/pci@0/pci@2/SUNW,qlc@0,1
QLogic QEM3572  Host Adapter FCode(SPARC): 3.06  08/19/09
ISP Firmware version 4.06.02
Fibre Channel Link down
Possible causes: No cable, incorrect connection mode or data rate
SFP state: 8Gb Present

/pci@400/pci@1/pci@0/pci@4/pci@0/pci@2/SUNW,qlc@0
QLogic QEM3572  Host Adapter FCode(SPARC): 3.06  08/19/09
ISP Firmware version 4.06.02
Fibre Channel Link down
Possible causes: No cable, incorrect connection mode or data rate
SFP state: 8Gb Present

/pci@400/pci@1/pci@0/pci@2/LSI,sas@0

FCode Version 1.00.54, MPT Version 2.00, Firmware Version 5.00.17.00

Target 9
  Unit 0   Disk   SEAGATE  ST930003SSUN300G 0B70
  SASDeviceName 5000c50039c1548b  SASAddress 5000c50039c15489  PhyNum 0
Target a
  Unit 0   Disk   SEAGATE  ST930003SSUN300G 0B70
  SASDeviceName 5000c50039bdd4cb  SASAddress 5000c50039bdd4c9  PhyNum 1

{0} ok



the following action I beleive is to boot from one of those two disks... But how to identify them in devalias command

Before you try to set the boot device can you boot from any of the disks??

You can try:

ok> boot disk0

ok> boot disk1

ok> boot disk2

ok> boot disk3

For each of the disks you have successfully listed.

Will the system boot from any of them?

1 Like

It does not boot in any of them:

for disk0 , just stays like below

{0} ok boot disk0
Boot device: /pci@400/pci@1/pci@0/pci@2/@0/disk@p0  File and args:

for disk disk1 :

 ok boot disk1
Boot device: /pci@400/pci@1/pci@0/pci@2/@0/disk@p1  File and args:
ERROR: boot-read fail


Can't locate boot device

for disk2

{0} ok boot disk2
Boot device: /pci@400/pci@1/pci@0/pci@2/@0/disk@p2  File and args:
ERROR: boot-read fail


Can't locate boot device

and for disk3

{0} ok boot disk3
Boot device: /pci@400/pci@1/pci@0/pci@2/@0/disk@p3  File and args:
ERROR: boot-read fail


Can't locate boot device

{0} ok

So none of those disks are bootable.

Did the system boot via the FC link which is now down?

Do you know from which disk it should boot?

It is possible that the proper boot disk is local disk0 which has sustained filesystem damage during the power cut. You may need to boot from DVD and fsck this filesystem. If that doesn't work you may need to recover some files from backup. It is likely that the FC link is down as far as the SC is concerned because Solaris isn't running and the QLogic driver isn't therefore loaded.

1 Like

Hi

I have logged in at the ILOM management port of this blade and I found a fault, but I dont know if this fault its hardware or software:

I have started the fault management shell:

-> start /CMM/faultmgmt/shell
Are you sure you want to start /CMM/faultmgmt/shell (y/n)? y

faultmgmtsp> fmadm faulty -r
/CH/BL0                                                       degraded
/CH/BL3                                                       degraded
faultmgmtsp> fmadm faulty
------------------- ------------------------------------ -------------- --------
Time                UUID                                 msgid          Severity
------------------- ------------------------------------ -------------- --------
2018-07-11/11:04:55 1a96f137-6015-c606-eaad-f1f971e1caa9 SPX86-8000-1D  Critical

Fault class : fault.chassis.device.fail

FRU         : /CH/BL0
              (Part Number: 541-4197-06)
              (Serial Number: 1005LCB-1126D4037D)

Description : A device necessary to support a configuration is missing.

Response    : The service required LED on the chassis will be illuminated.

Impact      : The chassis may be powered down.

Action      : The administrator should review the ILOM event log for
              additional information pertaining to this diagnosis.  Please
              refer to the Details section of the Knowledge Article for
              additional information.

------------------- ------------------------------------ -------------- --------
Time                UUID                                 msgid          Severity
------------------- ------------------------------------ -------------- --------
2018-07-12/08:17:34 6114a8ae-b70d-6648-be20-867e12d83674 SPX86-8000-1D  Critical

Fault class : fault.chassis.device.fail

FRU         : /CH/BL3
              (Part Number: 541-4197-06)
              (Serial Number: 1005LCB-1126D4037P)

Description : A device necessary to support a configuration is missing.

Response    : The service required LED on the chassis will be illuminated.

Impact      : The chassis may be powered down.

Action      : The administrator should review the ILOM event log for
              additional information pertaining to this diagnosis.  Please
              refer to the Details section of the Knowledge Article for
              additional information.

faultmgmtsp>

/CH/BL3 , its the server with this issue.

I have run a few commands:

-> show faulty
Target              | Property               | Value
--------------------+------------------------+---------------------------------
/CMM/faultmgmt/0    | fru                    | /CH/BL0
/CMM/faultmgmt/0/   | class                  | fault.chassis.device.fail
 faults/0           |                        |
/CMM/faultmgmt/0/   | sunw-msg-id            | SPX86-8000-1D
 faults/0           |                        |
/CMM/faultmgmt/0/   | uuid                   | 1a96f137-6015-c606-eaad-f1f971e1
 faults/0           |                        | caa9
/CMM/faultmgmt/0/   | timestamp              | 2018-07-11/11:04:55
 faults/0           |                        |
/CMM/faultmgmt/0/   | detector               | /CH/BL0/ERR
 faults/0           |                        |
/CMM/faultmgmt/0/   | fru_part_number        | 541-4197-06
 faults/0           |                        |
/CMM/faultmgmt/0/   | fru_serial_number      | 1005LCB-1126D4037D
 faults/0           |                        |
/CMM/faultmgmt/0/   | chassis_serial_number  | 1126BD0E75
 faults/0           |                        |
/CMM/faultmgmt/1    | fru                    | /CH/BL3
/CMM/faultmgmt/1/   | class                  | fault.chassis.device.fail
 faults/0           |                        |
/CMM/faultmgmt/1/   | sunw-msg-id            | SPX86-8000-1D
 faults/0           |                        |
/CMM/faultmgmt/1/   | uuid                   | 87864a2c-e6f7-ecad-a1cb-c75349fa
 faults/0           |                        | 0735
/CMM/faultmgmt/1/   | timestamp              | 2018-07-23/15:44:44
 faults/0           |                        |
/CMM/faultmgmt/1/   | detector               | /CH/BL3/ERR
 faults/0           |                        |
/CMM/faultmgmt/1/   | fru_part_number        | 541-4197-06
 faults/0           |                        |
/CMM/faultmgmt/1/   | fru_serial_number      | 1005LCB-1126D4037P
 faults/0           |                        |
/CMM/faultmgmt/1/   | chassis_serial_number  | 1126BD0E75
 faults/0           |                        |

->

I also run some repaired commands:

faultmgmtsp> fmadm faulty -r
/CH/BL0                                                       degraded
/CH/BL3                                                       degraded
faultmgmtsp> fmadm repaired /CH/BL3
faultmgmtsp> fmadm faulty -r
/CH/BL0                                                       degraded
/CH/BL3                                                       degraded
faultmgmtsp> fmadm repaired /CH/BL3
faultmgmtsp> fmadm faulty -r
/CH/BL0                                                       degraded
/CH/BL3                                                       degraded
faultmgmtsp> fmadm repair /CH/BL3
faultmgmtsp> fmadm faulty -r
/CH/BL0                                                       degraded
faultmgmtsp> fmadm faulty -r
/CH/BL0                                                       degraded
/CH/BL3                                                       degraded
faultmgmtsp> fmadm repair /CH/BL3
faultmgmtsp> fmadm faulty -r
/CH/BL0                                                       degraded
faultmgmtsp>
faultmgmtsp> fmadm faulty -r
/CH/BL0                                                       degraded
/CH/BL3                                                       degraded
faultmgmtsp>

still in degraded mode.

From this I concluded that perhaps the SP (service processor) is faulty and needs to be replaced. But not sure

I guess it's too late for this response,

but I will leave this comment here in case someone is useful

In case of Solaris boot from SAN, for any Storage Brand, is necessary execute additional steps in the OPENBOOT prompt, basically this steps are related to configure the FC HBA at low level, in order to indicate the FC topology and select the boot disk from the HBA level. When you set the ID disk in the HBA level, then the FC link come up.

The steps to config the HBA in the OPENBOOT prompt depend of any HBA brand/model.

There are manuals of Emulex and QLOGIC with this information.

Only to put a short example, for EMULEX.

NOTE *****
This is just a fragment of the manual, please review the complete procedure.

 
Specify the topology for the target Fibre Channel boot disk:

a.   At the OK> prompt, type the following and press ENTER
" <device path name>" select-dev
(Example: " /pci@1f,4000/lpfc@4" select-dev)
b.   At the OK> prompt, type the appropriate command and press ENTER:
FC-AL: set-fc-al
FC-SW: set-ptp
c.   At the OK> prompt, type unselect-dev and press ENTER

Follow these steps to set the boot device ID:
At the OK> prompt, type
 " /pci@1f,4000/lpfc@4" select-dev and press ENTER.
At the OK> prompt, type the appropriate command and press ENTER:
FC-AL: did  e4  0  2  set-boot-id
FC-SW: wwpn 5006.0482.bbff.4e0f 0 2  set-boot-id