Solaris 10 server crashed two times

Hi,

I have two Solaris 10 servers. First server crashed last week (Monday) and second one crashed over the weekend. I have checked the logs such as /var/adm/messages, syslog and dmesg. So for I found none. My management wants to know why the server crashed. I need to come with some kind of reasons.

I also searched for core file and didn't find any. Can someone guide me what else I can do to figure out why the server crashed.

What kind of hardware is it? Does it have ILOM? If it does, then you can check ILOM logs in /SP/logs/event/list (IIRC).

1 Like

Both systems reboot OK? Did you look in the older /var/adm/messages log files and not just the current messages file? Is crash dump enabled? If not, you should enable it if possible.

1 Like

I have Sunfire E6900. which has four domain. But I only have access to one. Other three are used by different groups currently I think they took it offline.
second one Sunfire E2900

On the E6900, I did go to console I was hostname-sc:D prompt. I typed " help " I saw this...

 
history          -- show command history
password         -- set the domain password
poweroff         -- powers off components 
poweron          -- powers on components
reset            -- reset the domain
resume           -- return to domain console
setdate          -- set the date and time for the domain
setdefaults      -- set default configuration values
setkeyswitch     -- set the keyswitch position
setls            -- set FRU location status
setupdomain      -- configure the domain
showboards       -- show board information
showcodusage     -- show COD resource usage
showcomponent    -- show state of a component
showdate         -- show the current date and time for the domain
showdomain       -- show domain configuration and status
showenvironment  -- show environmental information
showkeyswitch    -- show the keyswitch position
showlogs         -- show the logs
showresetstate   -- show CPU registers after reset
testboard        -- test a CPU/Memory board
 

I then typed " showlogs "

Jan 17 10:28:47 dev-sc Domain-D.SC: [ID 384869 local0.error] Domain watchdog timer expired.
Jan 17 10:28:47 dev-sc Domain-D.SC: [ID 180029 local0.notice] Using default hang-policy (RESET).
Jan 17 10:28:47 dev-sc Domain-D.SC: [ID 838382 local0.error] Saving reset state data before XIR.
Jan 17 10:28:50 dev-sc Domain-D.SC: [ID 580408 local0.notice] Resetting (XIR) domain.
Jan 17 10:28:50 dev-sc Domain-D.SC: [ID 815168 local0.error] Saving reset state data after XIR.

Can you advise what else I can look at?

---------- Post updated at 06:34 PM ---------- Previous update was at 06:29 PM ----------

yes. System is online now. Both of the server has Sybase running. Very important DB for the company. I did check the /var/adm/messages file. It has lot of data, but I didn't find anything useful as to why the system crashed.

What does this show:

grep panic /var/adm/messages*
grep kern /var/adm/messages*