Server hung

So my server was hung when I came in this morning. It was responding to pings, but the console and telnet sessions would not respond. There was no disk activity. The display said FA1F which I discovered that the "A" represents a high CPU load. I tired several things to get it going but was forced to boot server.

The first time I tried to boot the server it said "Terminating selection process. No boot device found".

*** Let me just clarify that I am an AIX guy and I just inherited this server and do not know much about HPUX.***

Having said that, I didn't know what to do so I booted the server again.

This time it came up. So I started doing some troubleshooting when it hung again. I was running the swapinfo command when it hung. I'm not sure if it was because of the swapinfo command or just a coincidence.

The display went to FA1F again so I waited for about 15 min and was forced to boot it. It seems to be working now.

Any idea what is going on here? Help with diags or troubleshooting steps would be great.

Thanks.

# uname -r
B.10.20

Type "dmesg" to see the last few kernel messages. But that "Terminating selection process. No boot device found" says it all. Your root disk is failing. Make sure you have a good backup.

10.20 is a very old OS and it is not supported anymore. .

F**F = os is running
FAF = load is 10 or above
F
1F = only one cpu

Boxes with displays like that are pretty old. A 9000/E240 or something like that is my guess. That is not supported anymore either.

Strange that it would fail once and then work fine after a reset. I am aware that this server is old and I am working to replace it. I do have a good backup.

In AIX I could run online diagnostics by doing "diag -a". Is there a similar command in HPUX?

Thanks.

Depend if the diagnostics are installed or not. As root try:
cstm
and see if you get to cstm prompt. Then try:
map
to see the devices. Pick a device a look at the device number, first column. Let's say you pick device 29...
sel dev 29
info
infolog
unselal

Info gathers some info. infolog displays it. unselal unselects the device. There is more you can do if that much works.

Thanks for your help.

Here is some more cstm stuff. Tested on an old 10.20 of our own...

from the cstm prompt, run logtool...
runu logtool

You should be at a logtool prompt.
sl
This will switch to a new log, as a side effect, you learn the name of the old log. "log2.raw.cur will be renamed to log2.raw" or something like that. So enter log2.raw when asked after you run:
sr
this will give you a summary of the errors in the log file you select. Then it prompts you with a misleading prompt. All this command can do is redisplay the summary...get out of this command and format the log file:
fr
You will pick the directory, it will pick the name. (It will remember that name for the next command.) Another summary and another chance to reprint the summary...get out. Now display the formatted log with fl:
fl
finally you see the errors.

Turns out the server has a bad processor. Maybe this information will help someone one day.

*** UPDATE ***

It also had a bad root disk.