Intermitent System Reboot's

Hi all,

Just started holidays (ya!) and Murphy's Law has kicked in already (doh!).

I'm looking after (when at work) two SCO 5.0.5 Systems running on Netfinity 5500 Servers (Model # 8662-3RY). Every once and a while the production server just reboot's itself. There is no mention of a cause in the system/debug/message/error logs and I'm now looking for alternative sources of information that might be a cause.

Does anyone have any ideas / suggestions as to where I might look or have seen an incident similar in your own travels?

Once i heard the story about the cleaner who pulled out the power cord, he needed the outlet for his vacuumcleaner

I've heard that story too. Another is that the cleaner plugged his equipment into a power strip and overloaded it causing a row of servers to go down.

As far as the real problem here - not much you can do. If there is nothing in ANY of the error logs, you could always call in hardware support to run test (and yes, I know this is production). Probably your best bet is to take a full backup of the server, buy/steal/borrow the exact type of equipment, build a duplicate and see if the problem shows up on it. If so, software or something running at the same time causes the crash. If not, hardware on the original - but good luck figuring it out until it either sends a message or fails completely (usually memory is a culprit in those situations).

Guys,

Found this in the /var/adm/syslog (must have been looking in on planet Mars the first time). The following is an extract of the syslog that is generated from a script that checks the log every evening at 23:55.

Oct 24 12:37:37 www3 lmail[27608]: Cannot open /usr/spool/mail/nobody: Operation would block
Oct 24 14:46:50 www3 lpd[381]: unknown printer: lf=/var/spool/lp/logs/output_log
Oct 24 14:46:57 www3 ifor_sld: PMDCT: Error accepting server side connection. (PM_THREAD_IPC_TIMEOUT) 
Oct 24 14:46:57 www3 ifor_sld: PMDCT: Error accepting server side connection. (PM_THREAD_IPC_TIMEOUT) 
Oct 24 14:46:57 www3 ifor_pmd: cleanup; terminating
Oct 24 14:46:57 www3 ifor_pmd: cleanup; terminating
Oct 24 14:46:57 www3 sco_cpd: cpd: pmd died
Oct 24 14:46:57 www3 Xsco[406]: Xsco: ERROR- Failed to initialize policy manager. (IFOR_PM_FATAL)
Oct 24 14:46:57 www3 Xsco[406]: Xsco: ERROR- Failed to initialize policy manager. (IFOR_PM_FATAL)
Oct 24 14:46:57 www3 ifor_pmd: terminated with status 100
Oct 24 14:46:57 www3 ifor_pmd: terminated with status 100
Oct 24 14:47:00 www3 ifor_pmd: ^M
         The Licensing Policy Manager Daemon (ifor_pmd) has terminated^M
         and been restarted.  This is a normal occurrence only when a^M
         license is removed with the License Manager utility.  If this is^M
         not the case, your system may have a problem which could lead to^M
         undesirable behavior.  Contact your SCO service provider for^M
         help if you suspect that there is a problem.^M 
Oct 24 16:34:25 www3 ftpd[15956]: #2 open of pid file failed: No such file or directory
Oct 24 16:40:01 www3 ftpd[18453]: #2 open of pid file failed: No such file or directory

What's this Licensing Policy Manager Daemon (ifor_pmd) ??

This should not cause a reboot.

Do you have more information about the reboots?
e.g. like alway at the same day/time, or always when a specific job is running?

when this happens mostly it had to do with the power or rear hardware problems, because otherwise the systems logs the reason of this PANIC reboot.

One of your errors is posted at PCUnix.com SCO FAQ ( ifor_pmd failed to init)

It is not out of the realm of possibility that you have a hardware problem.

I have systems that will panic and reboot when they have a hardware failure with a CPU or a card or even with memory. I would suggest that you have your hardware looked at.

If you have any crash dumps I would look at those, if your system has such a thing.

:smiley:

HI all,

Thanks to all those that have posted to this thread with suggestions.

Have been scanning a multitude of doc's and it could be as simple as a dying CMOS battery!

I've still much to read (sadly) ... so much for my holiday!

Going back to bad stories...

Not too long ago, some contractor was working in one of our data centers...

He plugged his dodgy ol' drill into one of the mains coming directly out of the UPS, since it was closest.
The dril happened to short out a little into it, and we lost power just long enough to lose all of our internet-connected machines (web, ftp, etc...) to a hasty reboot...

D'oh!