Unix unexpected repeated restarts

I have HP ds15 server with Unix Tru64

The problem : the server is restarting many times as unexpected and I don't now the reason for restarting and I can't analyze the problem .

what can I do? and where can I search ??

is this related to your post Suspending desktop login Unix tru64 ?

1 Like

You analyze these problems by reviewing the log files.

3 Likes

As @neo has already said, the first place to look is the system log(s). Usually if the system knows that it is about to crash it will report it in there.

There are, however, some issues that do not allow the system to log the incident but these are somewhat limited.
They are:

  1. A main memory (RAM) fault in kernel space that crashes the system. You should run a diagnostic on the RAM over as many iterations as possible to detect the fault. Of course, this means having the system down for a time. You could swap memory DIMMs between slots on the motherboard (adhering to any pairing rules, etc) to see if the problem changes; e.g. the system starts to report the problem in the log.

  2. If the system is crashing it will likely corrupt (however slightly) the filesystem(s). Therefore you should filesystem check your filesystems especially the root one. An uncorrected filesystem corruption could fell the system repeatedly, over and over.

  3. Is the power supply stable. Glitches in the power supply could cause a power failure and reboot without the system having the chance to report anything in the log.

  4. Some other more obscure hardware fault which is more difficult to solve (unless you can swap components with a similar hardware platform). Take out and reinsert the main processor(s).

Separately, I would also be looking at the general configuration such as checking that swap space is correctly configured and available.

2 Likes

I have changed the power supply with new one

And have checked the swap space

They all are ok

Which logs can I check?? And in which directories??

Which logs can I check?? And in which directories??

Also, for completeness, try a different mains wall socket.

No
It is a separate problem

https://community.hpe.com/t5/Operating-System-Tru64-Unix/Where-Can-I-find-True64-5-x-OS-messages-log-files/td-p/3055151

2 Likes

I already did it

might be of help/interest ....

https://community.hpe.com/t5/Operating-System-Tru64-Unix/Tru64-5-1b-system-reboots/m-p/3594895

1 Like

I edited my post above to include this:

This would be a good idea to try if nothing appears in the logs to explain the problem. It could be a poor processor contact in the socket.

1 Like

indeed.

a general visual inspection of all internal & external connections to ensure good contact is being made , a vacuum to minimised dust contamination, make sure nothing 'rattles' when the system is given a shake :smiley:

Another thing to check is that the cooling fans are running, otherwise there is the possibility of an overheat causing the system to crash and restart periodically.

and just in case we missed it, make sure it's connected to a UPS :slight_smile:

Slight power glitches can easily ruin anyones perfect setup when not on a UPS.

1 Like

Just to followup......

I live in a seaside condo in an semi-urban area in Thailand and all my electronic gear (computers, monitors, routers, switches, fiber optic head units) are on UPS. In fact, I currently have 4 UPS in my condo.

If I do not use these UPS devices, my computers and other gear would restart often because the power "glitches" momentarily, often for only less than a second; but in that time period, all devices would restart.

Of course, this is not good for any device, so I have UPS devices "everywhere".

I noticed the IP address of the OP was listed as in "Egypt" (using GeoIP). Egypt, like many countries, is known to have similar glitches in the power grid.

Of course, it is possible that the "restarts" mentioned by the OP in this topic are caused by other means, but I advise anyone everywhere to insure your electronic devices are connect to a quality UPS to prevent power surges and glitch-based reboots.

Odds are quite high that if your device experiences periodic reboots, it is not connected to the power grid via a UPS and is "naked" on the grid. Often these power glitches are not noticeable if they occur in milliseconds (which they often do), but the short glitch is enough to reboot a computer if not connected to a UPS.

The region of the world you live, and the season, plays a part in these "glitches"; so be aware of the stability (or lack of stability) in your power grid and seasonal loading factors.

Or, just do as I do and make sure all your computer-related gear is connected to a quality UPS.

2 Likes

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.