HPUX Boot failure.

When I login to the live console of my server and go to the console, I am getting the below message

Processor is booting from the first available device.

To discontinue, press any key within 10 seconds.

10 seconds expired.
Proceeding...

Trying Primary Boot Path
------------------------
Booting...
Boot IO Dependent Code (IODC) revision 0


HARD Booted.

ISL Revision A.00.44  Mar 12, 2003

ISL booting  hpux



Firmware Version  44.24

Duplex Console IO Dependent Code (IODC) revision 1

Firmware Version  44.24

Duplex Console IO Dependent Code (IODC) revision 1
------------------------------------------------------------------------------
   (c) Copyright 1995-2004, Hewlett-Packard Company, All rights reserved
------------------------------------------------------------------------------

  Processor   Speed            State           CoProcessor State  Cache Size
  Number                                       State              Inst    Data
  ---------  --------   ---------------------  -----------------  ------------
      0     1000  MHz   Active                 Functional         33554432 33554432
      1     1000  MHz   Idle                   Functional         33554432 33554432
      2     1000  MHz   Idle                   Functional         33554432 33554432
      3     1000  MHz   Idle                   Functional         33554432 33554432

  Central Bus Speed (in MHz)  :        200
  Available Memory            :    8388268  KB

 Good Memory Required        : Not initialized. Defaults to 32 MB.

   Primary boot path:    0/4/1/1.3
   Alternate boot path:  0/0/2/0.3
   Console path:         0/7/1/1.0
   Keyboard path:        0/0/4/0.0
 WARNING: The BMC System Event Log (SEL) is full.
          Use MP SL command to clear space so SEL events can be recorded.


 ERROR:   There was not enough error free memory to run the
          CPU late selftests.  Refer to the Page Deallocation
          Table in the service menu to review memory errors.

This message is coming over and over again in the Console logs.

And in the system Events I have got this entry coming frequently.

#  Location|Alert| Encoded Field    |  Data Field    |   Keyword / Timestamp
-------------------------------------------------------------------------------
0     SFW  0  *5  0xA080132000E00010 0000000000000000 BOOT_NOT_ENOUGH_ERROR_FREE_MEMORY
                                                      28 Nov 2013 13:20:21

Any idea what is happening here?

Thanks in advance..

What kind of box is this ? OS version?

The model details:

 Model: hp server . (model string 9000/800/rp3440)

Its Hpux 11.31

So id you havent changed anything before the last reboot ( compiled a new kernel? ) then the message says it all

 WARNING: The BMC System Event Log (SEL) is full.
          Use MP SL command to clear space so SEL events can be recorded.

I never had to configure a RP3440 so I dont remember what its using (GSP or MP) The closest I got is a

ant:/home/vbe $ model
9000/800/L2000-54

having a gsp...
But both (gsp or mp) should work the same way, you may need to do some search...
On the console I used to type ctrl-B and I get to the gsp console which would display something like Service Processor Login: Since I never configured them I'd press <enter> and you get the GSP prompt: GDP > type h for help...
SL is for show log - I dont remember how you remove them ( Years since last time I went there...) maybe just looking at them sufiice... So look carefullly at the menu and the help...
Then you would type co to return to the OS console.. Its my supprt dispatch week so sorry cant spend more time with you... will try to follow though and if I have a moment...
Good Luck!

So id you havent changed anything before the last reboot ( compiled a new kernel? ) then the message says it all

 WARNING: The BMC System Event Log (SEL) is full.
          Use MP SL command to clear space so SEL events can be recorded.

I never had to configure a RP3440 so I dont remember what its using (GSP or MP) The closest I got is a

ant:/home/vbe $ model
9000/800/L2000-54

having a gsp...
But both (gsp or mp) should work the same way, you may need to do some search...
On the console I used to type ctrl-B and I get to the gsp console which would display something like Service Processor Login: Since I never configured them I'd press <enter> and you get the GSP prompt: GDP > type h for help...
SL is for show log - I dont remember how you remove them ( Years since last time I went there...) maybe just looking at them sufiice... So look carefullly at the menu and the help...
Then you would type co to return to the OS console.. Its my supprt dispatch week so sorry cant spend more time with you... will try to follow though and if I have a moment...
Good Luck!

I have already cleared the logs. But still I am getting the same error. And after sometime the logs again gets filled and I am back to the start again.

What is in the logs?
You seem to have memory issue...

In the console logs I only have the same message(the one I have posted earlier).

In the System Events there are lot of entries. The latest one are below.

Log Entry 75: 29 Nov 2013 12:16:12
Alert Level 2: Informational
Keyword: MC_BR_TO_OS_HPMC_FAILED
MC_BR_TO_OS_HPMC_FAILED
Logged by: System Firmware  2
Data: Implementation dependent data field
0x5680106402E008D0 FFFFFFF0F0438E70


Log Entry 74: 29 Nov 2013 12:16:12
Alert Level 2: Informational
Keyword: MC_OS_HPMC_MISSING
MC_OS_HPMC_MISSING
Logged by: System Firmware  2
Data: Implementation dependent data field
0x5680104A02E008B0 000000F0F0D09800

Log Entry 73: 29 Nov 2013 12:16:12
Alert Level 2: Informational
Keyword: MEM_PDT_DUP_ENTRY
PDT entry to be added to PDT already exists
Logged by: System Firmware  2
Data: Event detail
0x4E8000D502E00890 0000000000D60000


Log Entry 72: 29 Nov 2013 12:16:12
Alert Level 5: Critical
Keyword: MEM_MBE_IN_RANK
Uncorrectable (multiple-bit) ECC error in DIMM
Logged by: System Firmware  2
Data: Location - Memory (SIMM or DIMM): DIMM Slot 0x3B, Extender 0
0x448000CC02E00870 FFFFFFFF003BFF74

Log Entry 71: 29 Nov 2013 12:16:11
Alert Level 5: Critical
Keyword: MEM_MBE_IN_RANK
Uncorrectable (multiple-bit) ECC error in DIMM
Logged by: System Firmware  2
Data: Location - Memory (SIMM or DIMM): DIMM Slot 0x3A, Extender 0
0x448000CC02E00850 FFFFFFFF003AFF74


Log Entry 70: 29 Nov 2013 12:16:11
Alert Level 5: Critical
Keyword: MEM_MBE_IN_RANK
Uncorrectable (multiple-bit) ECC error in DIMM
Logged by: System Firmware  2
Data: Location - Memory (SIMM or DIMM): DIMM Slot 0x2B, Extender 0
0x448000CC02E00830 FFFFFFFF002BFF74

Log Entry 61: 29 Nov 2013 12:16:10
Alert Level 7: Fatal
Keyword: ERR_CHECK_HPMC
An HPMC has been encountered.
Logged by: System Firmware  0
Data: Code address
0xE880035C00E00710 0000000000024344


Log Entry 60: 29 Nov 2013 12:16:10
Alert Level 7: Fatal
Keyword: MC_HPMC_MONARCH_SELECTED
MC_HPMC_MONARCH_SELECTED
Logged by: System Firmware  2
Data: Implementation dependent data field
0xF680105E02E006F0 FFFFFFF0F0C00000

You have 2 bad dimms, if they are of 4 GB, that means you have no more memory...
In MP, have you tried to use XD ? ( Diag and reset...)
Sorry off again...

I have tried all the test in the XD and all were successfull except the Modem selftests

Diagnostics Menu:
Non destructive tests:
     P - Parameter checksum
     I - I2C access (get BMC Device ID record)
     L - LAN access (PING)
     M - Modem selftests
Destructive tests:
     R - Restart MP

Enter menu item or [Q] to Quit: M
M

   Confirm? (Y/[N]): Y
Y

   Please wait .................


   -> Test result: FAIL

<CR> to continue...

Is this causing the problem? I thought it was some memory issues(after a lot of googling and your inputs)

And I think we are using 8 1GB ram modules.

 MEMORY STATUS TABLE (MB) (Current Boot Status)

Slot 0a  1024M   Active
Slot 0b  1024M   Active

Slot 1a  1024M   Active
Slot 1b  1024M   Active

Slot 2a  1024M   Active
Slot 2b  1024M   Active

Slot 3a  1024M   Active
Slot 3b  1024M   Active

Slot 4a  0
Slot 4b  0

Slot 5a  0
Slot 5b  0

Subtotal 8192M

   TOTAL =  8192 MB
           ---------

And I think 4 of them are damaged.


Log Entry 352: 29 Nov 2013 12:54:13
Alert Level 5: Critical
Keyword: MEM_MBE_IN_RANK
Uncorrectable (multiple-bit) ECC error in DIMM
Logged by: System Firmware  0
Data: Location - Memory (SIMM or DIMM): DIMM Slot 0x3B, Extender 0
0x448000CC00E02A30 FFFFFFFF003BFF74


Log Entry 351: 29 Nov 2013 12:54:13
Alert Level 5: Critical
Keyword: MEM_MBE_IN_RANK
Uncorrectable (multiple-bit) ECC error in DIMM
Logged by: System Firmware  0
Data: Location - Memory (SIMM or DIMM): DIMM Slot 0x3A, Extender 0
0x448000CC00E02A10 FFFFFFFF003AFF74


MP:SL (+,-,<CR>,D, F, L, J, H, K, T, A, U, ? for Help, Q or Ctrl-B to Quit) >



Log Entry 350: 29 Nov 2013 12:54:13
Alert Level 5: Critical
Keyword: MEM_MBE_IN_RANK
Uncorrectable (multiple-bit) ECC error in DIMM
Logged by: System Firmware  0
Data: Location - Memory (SIMM or DIMM): DIMM Slot 0x2B, Extender 0
0x448000CC00E029F0 FFFFFFFF002BFF74


Log Entry 349: 29 Nov 2013 12:54:13
Alert Level 5: Critical
Keyword: MEM_MBE_IN_RANK
Uncorrectable (multiple-bit) ECC error in DIMM
Logged by: System Firmware  0
Data: Location - Memory (SIMM or DIMM): DIMM Slot 0x2A, Extender 0
0x448000CC00E029D0 FFFFFFFF002AFF74


MP:SL (+,-,<CR>,D, F, L, J, H, K, T, A, U, ? for Help, Q or Ctrl-B to Quit) >



Log Entry 346: 29 Nov 2013 12:54:12
Alert Level 7: Fatal
Keyword: MC_HPMC_MONARCH_SELECTED
MC_HPMC_MONARCH_SELECTED
Logged by: System Firmware  0
Data: Implementation dependent data field
0xF680105E00E02970 FFFFFFF0F0C00000


Log Entry 340: 29 Nov 2013 12:54:11
Alert Level 7: Fatal
Keyword: ERR_CHECK_HPMC
An HPMC has been encountered.
Logged by: System Firmware  3
Data: Code address
0xE880035C03E028B0 000000F0F0D08068


MP:SL (+,-,<CR>,D, F, L, J, H, K, T, A, U, ? for Help, Q or Ctrl-B to Quit) >



Log Entry 339: 29 Nov 2013 12:54:11
Alert Level 7: Fatal
Keyword: ERR_CHECK_HPMC
An HPMC has been encountered.
Logged by: System Firmware  2
Data: Code address
0xE880035C02E02890 000000F0F0D08068


Log Entry 338: 29 Nov 2013 12:54:11
Alert Level 7: Fatal
Keyword: ERR_CHECK_HPMC
An HPMC has been encountered.
Logged by: System Firmware  0
Data: Code address
0xE880035C00E02870 0000000000024344


MP:SL (+,-,<CR>,D, F, L, J, H, K, T, A, U, ? for Help, Q or Ctrl-B to Quit) >



Log Entry 336: 29 Nov 2013 12:53:42
Alert Level 5: Critical
Keyword: BOOT_NOT_ENOUGH_ERROR_FREE_MEMORY
There was not enough error free memory in the system to run the late selftests
Logged by: System Firmware  0
Data: Data field unused
0xA080132000E02830 0000000000000000

So do I have any option other than replacing the RAM?

Will skipping the selftests help?

Well since I dont know what MP uses for memory, one thing I know ( for I had to at a time...) is there is an order in which memory has to be implanted, my idea is that with half the memory it can work if you know the order memory has to be in slots: you remove the bad ones and set what is left as if you had only half, it will then recognize the memory correctly then on ( cross your fingers...)
your issue is as if it removed the bad memory but then what is left isnt installed in the right slots...

Latest firmware for RP3440... you can download ...

http://h20565.www2.hp.com/portal/site/hpsc/template.PAGE/public/psi/swdDetails/?spf\_p.tpst=swdMain&spf\_p.prp\_swdMain=wsrp-navigationalState%3Didx%253D2%257CswItem%253Dpf\_92188_3%257CswEnvOID%253D%257CitemLocale%253D%257CswLang%253D%257Cmode%253D4%257Caction%253DdriverDocument&javax.portlet.begCacheTok=com.vignette.cachetoken&javax.portlet.endCacheTok=com.vignette.cachetoken

SUPERSEDES HISTORY:

Enhancements
PDC 46.34

Added BOOT support for the following PCI I/O cards:
    AD331A PCI-X, 1-port, GigE 1000Base-T Adapter
    AD332A PCI-X, 1-port, GigE 1000Base-SX Adapter 
Added BOOT, SWAP, and DUMP support for the following PCI I/O cards:
    AB378B PCI-X, 1-port, 4GB Fibre Channel Host Bus Adapter
    AB379B PCI-X, 2-port, 4GB Fibre Channel Host Bus Adapter 

PDC 45.11

Enabled support for PA-8900 processors.
Added support for a total of 8, 4GB DIMMs increasing the maximum system memory size from 24GB to 32GB.
Added 30 second delay after a "ser pdt clear" command at the BCH Main Menu to allow manual power down of the system and replacement of a bad DIMM without requiring rotation of all remaining DIMMs.
Added support for future 4GB Fiber Channel Host Bus Adapters.

So firmwarre update I would go for, look here:
PDC 45.44

Fixed an issue where the system may HPMC during every other boot just after memory self-test with a MEM\_UNEXPECTED_HPMC event when FastBoot was enabled.
In previous versions after a successful reboot following a system fault the System LED may not automatically change from flashing red to flashing yellow. 

PDC 45.11

In previous revisions with a four port lan card installed, a "ser scsi default" command at the BCH Main Menu may incorrectly display the following error: "ERROR: failed IODC write for path: 0x1000400".
In previous revisions a "deconfigured:stopped" processor may incorrectly be displayed as being in an "unknown" state when using the BCH Main Menu "in pr" command to view processor status.
In previous revisions when performing a "sea ipl" from the BCH Main Menu, any device that does not have a bootable lif may display a "bad lif magic" message while logging a "BOOT\_BAD\_LIF\_MAGIC_OTHER" event.
Any updates to the system clock at the BCH Main Menu or the Operating System will now always be reflected in the iLO Management Processor without requiring a system reset.
Autoboot will no longer be halted if the System Event Log is full.
In previous versions multiple Single Bit Errors with MEM\_CORR_ERR and MEM\_MULTIPLE\_ERRORS_DETECTED events may cause the system to hang during power on self test. 

PDC 44.24

Resolved an issue where the system would fail to dump following a TOC.
ErrorHandler will now send chassis codes to indicate the type of error encountered.
In the case of a DMT entry not being found the system will halt and send out a DMT\_ENTRY\_NOT_FOUND chassis code.
Added two chassis codes to send out the entire part number of the memory extender.
Resolved an issue which caused HPUX to incorrectly report installed physical memory.
Low Priority Machine Checks are turned off until HPUX boot is complete to avoid improperly registering a Low Priority Machine Check.
Corrected a parity error chassis code from returning incorrect data and triggering attention LED on every BCH boot during a memory ECC test.
Resolved an issue which caused random memory rank deallocations due to a missing resistor on the memory extender.
Corrected an issue that resulted in a stopped processor after running an MP memory test.
Corrected an issue where AUTOSTART flag was incorrectly read from stable store and when set, prevented autoboot and autosearch.
Resolved an issue when booting add-on PCI LAN cards that allowed the first LAN server to respond to boot the machine.
In previous versions, a PDC_ALLOC request to allocate space would return SUCCESS even when there was insufficient storage to do so.
Resolved an issue that resulted in a HPMC when attempting to boot on single core systems.
Corrected an issue that resulted in no PIM for an L1 cache error.
Resolved an issue that caused a memory hang when multiple memory errors are detected.
Corrected an issue that cleared the PDT on hard reset resulting in changes to memory configuration to be lost between DC power cycles.
Added logging of inbound correctable and uncorrectable errors.
Initialized a variable that when uninitialized, caused a CC\_MEM\_EXTENDER\_SPD_ERROR event indicating the memory extender SPD couldn't be read.
Corrected an issue that filled the SEL log with PDCE\_CALL\_TAKE\_TOO_LONG events which would require clearing the log before using autoboot.
Resolved an issue where an ACC card \(Z7340A\) placed in a PCI slot could not be mapped due to insufficient memory failure.
Corrected an issue where the Tachlite FibreChannel IODC driver \(A6795A\) failed to come online with the B-Series and M-Series switches at port F set at 2Gbps fixed speed, resulting in an FibreChannel boot failure at the BCH prompt: "IODC ENTRY_INIT failed. Error Status: -4".
1 Like