hardware test fails

Hi,
I have a SunFire 280R abd when I boot it there is a hardware check running and it fails. Here is a long output of the test

rsc> poweron
Are you sure you want to turn your system power on (Yes/No)?  yes
rsc> console

RSC Alert: Host System has Reset

@(#)OBP 4.5.10 2002/02/11 10:39 Sun Fire 280R
BBC AID Register 0000.0000.0000.0000
Power-On Reset
Executing Power On SelfTest

0>@(#) Excalibur POST 4.5.9   2002/02/05 21:25
       /export/work/staff/firmware_re/post/post-build-4.5.9_020205/excal/integrated  (firmware_re)
0>Jump from OBP->POST.
0>CPUs present in system: 0 1
0>diag-switch? configuration variable set TRUE.
0>Diag level set to MIN.
0>MFG scrpt mode set to NONE
0>I/O port set to RSC.
0>Done with First Init, reset system.
0>Current CPU frequency is 400 MHz.
0>      Resetting to 900
0>Clock Synth Reset.
0>
0>Start selftest...
0>Init CPU
0>      Cheetah_plus Version 2.2
0>DMMU Registers Access
0>DMMU TLB DATA RAM Access
0>DMMU TLB TAGS Access
0>IMMU Registers Access
0>IMMU TLB DATA RAM Access
0>IMMU TLB TAGS Access
0>Probe Ecache
0>      Size = 00000000.00800000...
0>Ecache Data Bitwalk
0>Ecache Address Bitwalk
0>Scrub and Setup Ecache
0>Setup and Enable DMMU
0>Setup DMMU Miss Handler
0>Test and Init Temp Mailbox
1>Init CPU
1>      Cheetah_plus Version 2.2
1>DMMU Registers Access
1>DMMU TLB DATA RAM Access
1>DMMU TLB TAGS Access
1>IMMU Registers Access
1>IMMU TLB DATA RAM Access
1>IMMU TLB TAGS Access
1>Probe Ecache
1>      Size = 00000000.00800000...
1>Ecache Data Bitwalk
1>Ecache Address Bitwalk
1>Scrub and Setup Ecache
1>Setup and Enable DMMU
1>Setup DMMU Miss Handler
1>Test and Init Temp Mailbox
0>Initializing Scan Database
0>BCC:          1483203b
0>SCSI:         15060045
0>ICHIP:        0d1e203b
0>RIO:          13e5d03b
0>SCHIZO:       1824c06d
0>CPMS0:        1142903b
0>CPMS1:        1142903b
0>CPMS2:        1142903b
0>CPMS3:        1142903b
0>CPMS4:        1142903b
0>CPMS5:        1142903b
0>FCAL:         1000a12f
0>Init I2C
0>Unquiesce Safari
0>Blast Fans
0>Set Trip Temp CPU 0 to 110C
0>Set Trip Temp CPU 1 to 110C
0>FRI SEP  11 9:00:05 GMT 9
0>Safari check
0>Probe and Setup Memory
0>INFO: 1024MB Bank 0
0>INFO: No memory detected in Bank 1
0>INFO: 1024MB Bank 2
0>INFO: No memory detected in Bank 3
0>
0>ERROR: TEST = Probe and Setup Memory
0>H/W under test = CPU0 Memory
0>MSG =


        ERROR:1 AFSR Error 00000002.00000040, AFAR 00000000.00000010.


0>END_ERROR

0>      CE bit: Correctable system data ECC error
0>ERROR: TEST = Probe and Setup Memory
0>H/W under test = CPU0 Bank 0 Dimm 0, J0100 side 1
0>MSG = DIMM failure Bank 0 DIMM 0 Pin 27
0>END_ERROR

0>WARNING: TEST = Probe and Setup Memory
0>H/W under test = CPU0 Bank 0 Dimm 0, J0100 side 1
0>MSG =                 AFSR error after running test Probe and Setup Memory.
0>END_WARNING

0>Data Bitwalk on Master
0>      Test Bank 0.
0>      Test Bank 2.
0>ERROR: TEST = Check Mem Banks
0>H/W under test = Menu Utility Device
0>MSG = Offline Bank 0.
0>END_ERROR

0>Address Bitwalk on Master
0>INFO: Addr walk mem test on CPU 0 Bank 2: 00000002.00000000 to 00000002.40000000.
0>Set Mailbox
0>Setup Final DMMU Entries
0>Post Image Region Scrub
0>Run POST from Memory
0>Verifying checksum on copied image.
0>The Memory's CHECKSUM value is 6ed4.
0>The Memory's Content Size value is 9d9c9.
0>Success...  Checksum on Memory Validated.
1>Probe and Setup Memory
1>INFO: No memory on cpu 1
1>Set Mailbox
1>Data Bitwalk on mem
1>      Test Bank 2.
1>Setup Final DMMU Entries
1>Map Slave POST to master memory
1>Print Mem Config
1>Caches : Icache is ON, Dcache is ON, Wcache is ON, Pcache is ON.
0>Print Mem Config
0>Caches : Icache is ON, Dcache is ON, Wcache is ON, Pcache is ON.
0>Memory in non-interleave config:
0>      Bank 2 1024MB : 00000002.00000000 -> 00000002.40000000.
0>Scrub Memory
0>Quick Block Mem Test
0>Quick Test 16777216 bytes at 00000002.00600000
0>40% Done...
1>Flush Caches
0>Flush Caches
0>Schizo unit 0 init      test
0>Schizo unit 0 reg       test
0>Schizo unit 0 mem       test
0>Schizo unit 0 PCI DMA A test
0>Schizo unit 0 PCI DMA B test
0>Schizo unit 0 PCI merg  test
0>Schizo unit 0 PCI iommu test
0>Schizo unit 0 PCI stc   test
0>Schizo unit 0 interrupt test
1>Schizo unit 0 init      test
1>Schizo unit 0 PCI merg  test
1>Schizo unit 0 interrupt test
0>
0>Turn Schizo 0 errors on
0>Turn error traps on
0>ERROR:
0>      POST toplevel status has the following failures:
0>              CPU0 Memory Bank 0
0>      POST failed the following devices on CPU 0:
0>              Mem Bank0 DIMM0
0>END_ERROR

0>POST: Return to OBP.
1>Return to OBP, FAIL
0>Return to OBP, FAIL

RSC Alert: Host System has Reset

@(#)OBP 4.5.10 2002/02/11 10:39 Sun Fire 280R
BBC AID Register 0000.0000.0000.0000
POST Results: Cpu 0
  %o0  ffff.ffff.ffff.ffff
  %o1  ffff.ffff.ffff.ffff
  %o2  ffff.ffff.ffff.ffff
POST Results: Cpu 1
  %o0  ffff.ffff.ffff.ffff
  %o1  ffff.ffff.ffff.ffff
  %o2  ffff.ffff.ffff.ffff
CPU seeprom format: 0000.0000.0000.0002
Membase: 0000.0000.0000.0000
MemSize: 0000.0000.0010.0000
Clearing TLBs Done
Init CPU arrays Done
Init E$ tags Done
Setup TLB Done
MMUs ON
Copy Done
PC = 0000.07ff.f008.5970
PC = 0000.0000.0000.59e8
Decompressing Done
Size = 0000.0000.0006.f8b0
ttya initialized
Start Reason: Initialize Machine
Configuring the machine:


@(#)OBP 4.5.10 2002/02/11 10:39 Sun Fire 280R
BBC AID Register 0000.0000.0000.0000
Loading Configuration
Membase: 0000.0000.0000.0000
MemSize: 0000.0000.8000.0000
Clearing TLBs Done
Init CPU arrays Done
Init E$ tags Done
Setup TLB Done
MMUs ON
Block Scrubbing Done
Copy Done
PC = 0000.07ff.f008.5970
PC = 0000.0000.0000.59e8
Decompressing Done
Size = 0000.0000.0006.f8b0
ttya initialized
Start Reason: First start after Power On
System Reset: (SPOR) (PLL)
Probing gptwo at 0,0 SUNW,UltraSPARC-III+ (900 MHz @ 6:1, 8 MB)
   memory-controller
Probing gptwo at 1,0 SUNW,UltraSPARC-III+ (900 MHz @ 6:1, 8 MB)
   memory-controller
Probing gptwo at 8,0 pci pci
Loading Support Packages: kbd-translator
Loading onboard drivers: ebus flashprom bbc power i2c dimm-fru dimm-fru
   dimm-fru dimm-fru nvram idprom i2c cpu-fru temperature cpu-fru
   temperature fan-control motherboard-fru ioexp ioexp ioexp
   fcal-backplane remote-system-console power-distribution-board
   power-supply power-supply rscrtc beep rtc gpio pmc parallel
   rsc-control rsc-console serial
CPU 0 set ambient power off temperature to 70 degrees C
CPU 0 set junction power off temperature to 110 degrees C
CPU 1 set ambient power off temperature to 70 degrees C
CPU 1 set junction power off temperature to 110 degrees C
Memory Configuration:
Segment @ Base:        0  Size:  2048 MB ( 2-Way)
Probing /pci@8,600000 Device 4  SUNW,qlc fp disk
Probing /pci@8,600000 Device 1  ethernet
Probing /pci@8,700000 Device 5  network usb
Probing /pci@8,700000 Device 6  scsi disk tape scsi disk tape
Probing /pci@8,700000 Device 1  Nothing there
Probing /pci@8,700000 Device 2  Nothing there
Probing /pci@8,700000 Device 3  pci
Probing /pci@8,700000/pci@3 Device 0  pci108e,1000 SUNW,qfe
Probing /pci@8,700000/pci@3 Device 1  pci108e,1000 SUNW,qfe
Probing /pci@8,700000/pci@3 Device 2  pci108e,1000 SUNW,qfe
Probing /pci@8,700000/pci@3 Device 3  pci108e,1000 SUNW,qfe
Probing /pci@8,700000/pci@3 Device 4  Nothing there
Probing /pci@8,700000/pci@3 Device 5  Nothing there
Probing /pci@8,700000/pci@3 Device 6  Nothing there
Probing /pci@8,700000/pci@3 Device 7  Nothing there
Probing /pci@8,700000/pci@3 Device 8  Nothing there
Probing /pci@8,700000/pci@3 Device 9  Nothing there
Probing /pci@8,700000/pci@3 Device a  Nothing there
Probing /pci@8,700000/pci@3 Device b  Nothing there
Probing /pci@8,700000/pci@3 Device c  Nothing there
Probing /pci@8,700000/pci@3 Device d  Nothing there
Probing /pci@8,700000/pci@3 Device e  Nothing there
Probing /pci@8,700000/pci@3 Device f  Nothing there
Probing /pci@8,700000 Device 4  Nothing there

Start Reason: First start after Power On
System Reset: (SPOR) (PLL)
Probing gptwo at 0,0 SUNW,UltraSPARC-III+ (900 MHz @ 6:1, 8 MB)
   memory-controller
Probing gptwo at 1,0 SUNW,UltraSPARC-III+ (900 MHz @ 6:1, 8 MB)
   memory-controller
Probing gptwo at 8,0 pci pci
Loading Support Packages: kbd-translator
Loading onboard drivers: ebus flashprom bbc power i2c dimm-fru dimm-fru
   dimm-fru dimm-fru nvram idprom i2c cpu-fru temperature cpu-fru
   temperature fan-control motherboard-fru ioexp ioexp ioexp
   fcal-backplane remote-system-console power-distribution-board
   power-supply power-supply rscrtc beep rtc gpio pmc parallel
   rsc-control rsc-console serial
CPU 0 set ambient power off temperature to 70 degrees C
CPU 0 set junction power off temperature to 110 degrees C
CPU 1 set ambient power off temperature to 70 degrees C
CPU 1 set junction power off temperature to 110 degrees C
Memory Configuration:
Segment @ Base:        0  Size:  2048 MB ( 2-Way)
Probing /pci@8,600000 Device 4  SUNW,qlc fp disk
Probing /pci@8,600000 Device 1  ethernet
Probing /pci@8,700000 Device 5  network usb
Probing /pci@8,700000 Device 6  scsi disk tape scsi disk tape
Probing /pci@8,700000 Device 1  Nothing there
Probing /pci@8,700000 Device 2  Nothing there
Probing /pci@8,700000 Device 3  pci
Probing /pci@8,700000/pci@3 Device 0  pci108e,1000 SUNW,qfe
Probing /pci@8,700000/pci@3 Device 1  pci108e,1000 SUNW,qfe
Probing /pci@8,700000/pci@3 Device 2  pci108e,1000 SUNW,qfe
Probing /pci@8,700000/pci@3 Device 3  pci108e,1000 SUNW,qfe
Probing /pci@8,700000/pci@3 Device 4  Nothing there
Probing /pci@8,700000/pci@3 Device 5  Nothing there
Probing /pci@8,700000/pci@3 Device 6  Nothing there
Probing /pci@8,700000/pci@3 Device 7  Nothing there
Probing /pci@8,700000/pci@3 Device 8  Nothing there
Probing /pci@8,700000/pci@3 Device 9  Nothing there
Probing /pci@8,700000/pci@3 Device a  Nothing there
Probing /pci@8,700000/pci@3 Device b  Nothing there
Probing /pci@8,700000/pci@3 Device c  Nothing there
Probing /pci@8,700000/pci@3 Device d  Nothing there
Probing /pci@8,700000/pci@3 Device e  Nothing there
Probing /pci@8,700000/pci@3 Device f  Nothing there
Probing /pci@8,700000 Device 4  Nothing there


Sun Fire 280R (2 X UltraSPARC-III+) , No Keyboard
Copyright 1998-2002 Sun Microsystems, Inc.  All rights reserved.
OpenBoot 4.5, 2048 MB memory installed, Serial #51563391.
Ethernet address 0:3:ba:12:cb:7f, Host ID: 8312cb7f.



Power On Selftest Failed.
   CPU: 0 cause: OBMD
   CPU: 1 cause: 
Aborting auto-boot sequence.
{0} ok

Any ideas what can cause this problem ? When I reset/reboot it several times then it seems to boot ..

Thanks,
Tex

Looks like a memory problem. Replace the faulty DIMM

Hi,

do you have service contract ?
If not and if you can you can make a memorychange for a test..
Pull out J0100 of CPU0 and put it in an other slot, the memory of the second slot put in J0100.
As example, J0100 must be the nearst to the CPUs, change it with J0101. J0101 ist the first memory in bank1 for cpu1. There must be a little lettering/writting beside the memory slots.

If you turn on power an you have failure on J0101 you need only a new memory module, if the failure stuck on J0100 i guess you need a new mainboard.

If you have a contract with sun or a supplier than pls handy up and let they check the system.