Hardware faulty, but which hardware?

Hi folk,

I have this hardware faunty message, but dont know which hardware is this ? can you guide me ?

--------------- ------------------------------------  -------------- ---------
TIME            EVENT-ID                              MSG-ID         SEVERITY
--------------- ------------------------------------  -------------- ---------
Aug 08 10:18:49 60dfafdd29-5456-4c21-eeda-8afafd6f61b8  PCIEX-8000-G2  Major

Host        : hsbc78
Platform    : SUNW,Netra-T2000  Chassis_id  :

Fault class : fault.io.pciex.device-interr max 15%
              fault.io.pciex.device-noresp max 15%
              fault.io.pciex.bus-noresp max 8%
              fault.io.pciex.device-invreq 8%
Affects     : dev:////pci@7c0/pci@0/pci@2/network@0
              dev:////pci@7c0/pci@0
              dev:////pci@7c0/pci@0/pci@2
              dev:////pci@7c0
                  faulted but still in service
FRU         : "IOBD" (hc:///component=IOBD)
                  faulty

Description : A problem has been detected on one of the specified devices or on
              one of the specified connecting buses.
              Refer to http://sun.com/msg/PCIEX-8000-G2 for more information.

Response    : One or more device instances may be disabled

Impact      : Loss of services provided by the device instances associated with
              this fault

Action      : Ensure that the latest drivers and patches are installed.
              Otherwise schedule a repair procedure to replace the affected
              device(s).  Use fmadm faulty to identify the devices or contact
              Sun for support.

many thanks !!

Hi,
what if you try to go to the provided link_
http://sun.com/msg/PCIEX-8000-G2

see ya
fra

1 Like

I did, but i still dont know which card is this...if we need for replacement, which types of card or name should i tell Oracle ?

try to reset the message with "fmadm repair or repaired" (depends on your solaris version). after that have a look if the message reappears. also check the output of "fmstat" for growing events on the specific fmri.
the provided path (pci@7c0/pci@0/pci@2/network@0) points to the onboard network interface "net2 on the backside or e1000g2 in solaris". so if the hardware is really faulty, you'll need a new motherboard!
network problems (on switch side or cable) can be root cause of the message. so resetting the message and checking for reappearence would be my first shoot!

1 Like

thanks DukeNuke2...
a question, how do you know it points to net2 or e10000g2 interface ?

by the way, I run the fmadm repaired all i got is this

fmadm: failed to record repair to 60dfafdd29-5456-4c21-eeda-8afafd6f61b8: specified resource is not known to be faulty

http://solaris-training.com/classp/SA345_HTML/path.htm

what is the output of "fmstat"?

the output something like this,

        Problem in: hc://:product-id=SUNW,Netra-T2000:server-id=miclawap2/ioboard=0/hostbridge=0/pciexrc=1/pciexbus=2/pciexdev=0/pciexfn=0/pciexbus=3/pciexdev=2/pciexfn=0
           Affects: dev:////pci@7c0/pci@0/pci@2
               FRU: hc:///component=IOBD
          Location: -

    10%  fault.io.pciex.device-noresp

        Problem in: hc://:product-id=SUNW,Netra-T2000:server-id=miclawap2/ioboard=0/hostbridge=0/pciexrc=1
           Affects: dev:////pci@7c0
               FRU: hc:///component=IOBD
          Location: -

    10%  fault.io.pciex.device-invreq

        Problem in: hc://:product-id=SUNW,Netra-T2000:server-id=miclawap2/ioboard=0/hostbridge=0/pciexrc=1
           Affects: dev:////pci@7c0
               FRU: hc:///component=IOBD
          Location: -

   20%  fault.io.pciex.device-interr

        Problem in: hc://:product-id=SUNW,Netra-T2000:server-id=miclawap2/ioboard=0/hostbridge=0/pciexrc=1/pciexbus=2/pciexdev=0/pciexfn=0/pciexbus=3/pciexdev=2/pciexfn=0/pciexbus=7/pciexdev=0/pciexfn=0
           Affects: dev:////pci@7c0/pci@0/pci@2/network@0
               FRU: hc:///component=IOBD
          Location: -

   20%  fault.io.pciex.device-interr

        Problem in: hc://:product-id=SUNW,Netra-T2000:server-id=miclawap2/ioboard=0/hostbridge=0/pciexrc=1/pciexbus=2/pciexdev=0/pciexfn=0/pciexbus=3/pciexdev=2/pciexfn=0
           Affects: dev:////pci@7c0/pci@0/pci@2
               FRU: hc:///component=IOBD
          Location: -

   10%  fault.io.pciex.device-interr

        Problem in: hc://:product-id=SUNW,Netra-T2000:server-id=miclawap2/ioboard=0/hostbridge=0/pciexrc=1/pciexbus=2/pciexdev=0/pciexfn=0
           Affects: dev:////pci@7c0/pci@0
               FRU: hc:///component=IOBD
          Location: -

    10%  fault.io.pciex.device-interr

        Problem in: hc://:product-id=SUNW,Netra-T2000:server-id=miclawap2/ioboard=0/hostbridge=0/pciexrc=1
           Affects: dev:////pci@7c0
               FRU: hc:///component=IOBD
          Location: -

    8%  fault.io.pciex.bus-noresp

        Problem in: hc://:product-id=SUNW,Netra-T2000:server-id=miclawap2/ioboard=0/hostbridge=0/pciexrc=1/pciexbus=2/pciexdev=0/pciexfn=0/pciexbus=3/pciexdev=2/pciexfn=0/pciexbus=7/pciexdev=0/pciexfn=0
           Affects: dev:////pci@7c0/pci@0/pci@2/network@0
               FRU: hc:///component=IOBD
          Location: -

    10%  fault.io.pciex.bus-noresp

        Problem in: hc://:product-id=SUNW,Netra-T2000:server-id=miclawap2/ioboard=0/hostbridge=0/pciexrc=1/pciexbus=2/pciexdev=0/pciexfn=0
           Affects: dev:////pci@7c0/pci@0
               FRU: hc:///component=IOBD
          Location: -

sure this is the fmstat output? should look something more like this:

module          ev_recv ev_acpt wait svc_t   %w  %b  open solve  memsz bufsz
 
cpumem-diagnosis   0       0    0.0  0.0     0   0   0    0      3.0   K0
 
cpumem-retire      0       0    0.0  0.0     0   0   0    0      0     0
 
eft                1       1    0.0  1191.8  0   0   1    1      3.3M  11K
 
fmd-self-diagnosis 0       0    0.0  0.0     0   0   0    0      0     0
 
io-retire          1       0    0.0  32.4    0   0   0    0      37b   0
 
syslog-msgs        1       0    0.0  0.5     0   0   0    0      32b   0

opps, sorry i run the wrong command, here is the output

# fmstat
module             ev_recv ev_acpt wait  svc_t  %w  %b  open solve  memsz  bufsz
cpumem-diagnosis         1       0  0.0    6.4   0   0     2     0   4.0K   744b
cpumem-retire            0       0  0.0    0.1   0   0     0     0    12b      0
disk-transport           0       0  0.0   12.1   0   0     0     0    40b      0
eft                     17      17  0.0   58.6   0   0     1     0   975K   282b
etm                      4       0  0.0    5.8   0   0     0     0   1.2K   144b
fabric-xlate             2       0  0.0    0.6   0   0     0     0      0      0
fmd-self-diagnosis     530       0  0.0    0.0   0   0     0     0      0      0
io-retire                4       0  0.0    0.1   0   0     0     0      0      0
snmp-trapgen             0       0  0.0    0.1   0   0     0     0    32b      0
sp-monitor               0       0  0.0    1.6   0   0     0     0    24b      0
sysevent-transport       0       0  0.0    5.5   0   0     0     0      0      0
syslog-msgs              0       0  0.0    0.0   0   0     0     0      0      0
zfs-diagnosis            0       0  0.0    1.6   0   0     0     0      0      0
zfs-retire               0       0  0.0    0.0   0   0     0     0      0      0