Sun Server T2000 occasionally reboot

Hi, i am really 'fresh' to Solaris or any UNIX OS. My role as web developer but i need slightly involve to Solaris support. It is harder for me to understand it and i recently encounter a problems.

/var file system (/dev/md/rdsk/d425) is being checked.
run fsck -F ufs /dev/md/rdsk/d425

Jun 7 03:16:54 svc:startd[7]: svc:/system/filesystem/minimal:default: Method �/lib/svc/method/fs-minimal� failed with exit status 95
Jun 7 03:16:55 svc:startd[7]: system/filesystem/minimal:default failed fatally: transitioned to maintenance (see �svcs -xv for details)
Requesting system maintenance mode
(See /lib/svc/share/README for more information)
Console login service(s) cannot run

Everytime, we has to run fsck command. It is happen several time ago. Who's know what's wrong?

We had this issue on two T2000 servers, and were getting rebooted occationally(at least once in a week). No logs were left behind. Sun did not find any resolution, and hooked up a permanent display console to capture any logs...nothing found..and server got bounced again...siliently..leaving no logs. :slight_smile: ..like a suspense story

Finally, the issue went away after having the System Board replaced.

I experienced the same issue whereby Sun came out with a patch (somewhere in April) to fix the problem stated below. It went well for a couple weeks but the problem re-surfaced.

  APR 22 23:02:04: 0004007c: "System poweron is disabled."
  APR 22 23:02:04: 00040083: "Chassis cover removed."
  APR 22 23:02:04: 0004000e: "SC Request to Power Off Host Immediately."
  APR 22 23:02:15: 00040029: "Host system has shut down."

No solid solution from Sun for my case yet :frowning:

Looking at your logs posted above. "Chassis cover removed."...seems it may be loose or something. If chassis cover gets opened, system automatically shuts itself down, though.

I have tried re-attaching the chassis but its weird still. Appreciate if you can shed some lights.

That's the patch that i installed to fix the bug.

I am sure you had updated the firmware, correctly. But, can you post the 'showhost -version' output here.

Here you go. I retrieved it from the explorer.

sc> showhost
Sun-Fire-T2000 System Firmware 6.7.3  2009/04/01 11:21

Host flash versions:
   OBP 4.30.0 2008/12/11 12:15
   Hypervisor 1.7.0 2008/12/11 13:43
   POST 4.30.0 2008/12/11 12:41 

Thanks, it is unfortunate that the latest firmware did not fix the said bug. The problem is with the hardware component itself (LOM). As LOM is integrated on the System Board, Sun will have to replace the SB itself. Ofcourse, they will drag this to max. extent to suffer the customer, anyway.

We can try few more things, on how LOM firmware should react to the alerts. Can you make sure you have "auto-boot-on-error?" set to "false" ? if not you can follow below

------------------
ok> setenv auto-boot-on-error? false
ok> reset-all

Page 39 ==> Disabling AutoSystem Recovery
http://dlc.sun.com/pdf/819-2549-12/819-2549-12.pdf
-----------------------------

We are just trying to change the behaviour of OBP on certain failure occations. But in your case, it may not work as "Chassis Covered Removed" is a serious message for OBP to shutdown himself immediatly...but worth a try.

Ahh..just got a thought. If you have time and luxury, please make sure there is no material stuck (or some thing) at the the chassis cover interlock switch (intrusion switch). Not having this switch with proper contact after closing the top cover, might also lead to this "Chassis Cover removed" senarios. Good luck.

Hi,
auto-boot-on-error? has already been set to false previously.

Customer has already started questioning the reliability of Sun and hopefully the system board replacement will get this resolved. As you said, perhaps they won't drag this to the max, else... :slight_smile:

I will keep looking for the possible solutions. Thank you for your help. Appreciated it much!