Strange problem.Please Help !

I��m a network operator, mine is an IBM PC server 320, operating system SCO unix 3.2v4.2, triton 3.1 of Baan. Recently, my server went dead every a few hours, no sign & signal shows malfunction suddenly. It looks like a sudden power failure, but the indicator of main power supply is on. Normal problem, unix would record or give error message, serious problem would shut down with error message. But now no error message was shown before it dead. Everything shows ok when I restart the machine. From which respect would lead this kind of problem: virus, application program, operation system or hardware, ex. Cpu, harddisk, Memory, Mainbord, SCSICard?
Would only filesystem errors lead to this things ?
Can you give me some idea what problem this is?

Check the file /usr/adm/messages, find two item previously
One :
alad:adapter 0 Error: Target Bus Phase Sequence Error.
WARNING:err: Error log overflow 0 lun=0.block=79131818
Another :
Unrecoverable error reading SCSI DISK 0 dev/42(ha=0,id=0,lun=0) Illegal request.
These message can give some helpful?
Can you help me out?

If the UNIX kernal crashed (panics) you might not get any error messages because the kernal logging facility might only right to the console. However, you said it 'looks like a power outage when it dies'; so I assume you cannot read a console. Is that right? If you can make sure the console monitor is up and working, if the kernel panics you will see something on the monitor (normally).

I am going to guess that you are having a problem which shuts down the hardware, i.e. the motherboard; because this would kill the video output as well as the disks, etc. Without further information the best guess I can make is a motherboard problem because you say the 'entire system dies like a power outage'.

Motherboards can and do break. Bugs in the bios can cause strange behavior. The SCSI error you post would cause the system to lock-up, but the power, monitor, console would still work. So, if the system is just 'locking up' but the video card works; the next logical place would be the SCSI controller (and you have an error message to back this up.)

The most common cause of SCSI errors is not terminating the ends of the SCSI bus, or having an improper termination in the middle of the bus. So, the first thing to do is to insure that your SCSI bus is configured properly and terminated properly. This is not always easy if you are not a SCSI guru.

Hope this helps get up started. If you have any more clues, please post. We love puzzles here at UNIX.COM :slight_smile: