SCO OpenServer 5 Will Not Boot

We have a legacy system that runs on SCO OpenServer 5.0.6, and I've rebuilt the server a couple of times so we can have access to old historical data. That system suddenly went offline for no known reason, and I know that the configuration has not been altered, there is nobody with access to the root login except for me, and there have been no changes. The current problem does not appear to be with the hard drive, as in past problems I have dealt with. When the system starts up I can see that it "finds" and lists the IDE drive, and it will come to the "Boot:" prompt. When I hit "Enter" at the boot prompt, like normal, it goes through about one screen of the loading process, and then it just goes blank, and I get a "no signal" notification on the monitor, and the machine restarts. I have tried entering "unix.old" at the boot prompt, but is still does the same thing. Then I read somewhere online to try "unix.safe" at the boot prompt, and this got me further. It looked like it was going to take me all the way to a login prompt, but then it stopped and this error showed up on the screen:

Open event driver failed.... Fatal server error.... Check mouse configuration

So I thought maybe something happened to the mouse, and I plugged a different one in but it still does the same thing. Like I said I know that there was no change made to any mouse configuration in the O/S, so what can I try now? I would appreciate any and all suggestions about how I can get this system back up and running. Thanks.

You say that you rebuilt the server a couple of times so I guess that you have installation media? Or do you have a root & boot floppy set? Either way, you should be able to boot the system into a (root) shell, mount the hard disk root filesystem and take a look around (and run fsck on that filesystem to check its integrity).

OK, I tried booting into unix.safe again, only this time I entered Single User (maintenance) mode and it allowed me to login as root. It shows that the / and /stand file systems are mounted. Now I want to be careful, does the "fsck" command just check the file system, or does it attempt to repair? I ask this because - whenever I have booted this machine for the past 15 years it has come up with the message that "the root file system needs checking..." but I always say "no" because I seem to recall that when I said "yes" (and this was back when the system was actively used) it created some type of error and I had to restore the whole think from backup. I do not want to have that happen again, so whatever I try now I would prefer not to lose data. Is it save to proceed with the "fsck" command on the root file system without anything "dangerous" happening? Also can you give me the syntax to run the command properly? Thanks!

The way to fsck a filesystem without anything dangerous happening is to use the -n switch:

# fsck -n <filesystem#>

with the -n meaning, whatever the question is the answer is no. Therefore, no modifications/corrections to the filesystem but the extent of any damage will be shown.

The opposite is:

# fsck -y <filesystem>

which means all questions are automatically answered yes meaning correct everything, and yes, in the case of severe damage, can destroy the filesystem completely.

If damage is limited then you can run fsck again with neither -n or -y and answer each question yourself individually to carry out corrections.

I was able to run the fsck on the root file system, and there is apparently some damage. Here are the errors that I'm getting:

At beginning: Cannot Read: BLK 269191880

Then: FOLLOWING DISK SECTORS COULD NOT BE READ: 269191880, 269191881
Cannot Read block bitmap (logical block 134595940)

At Phase 4: FREE INODE COUNT WRONG IN SUPERBLK

At Phase 6: FREE BLK COUNT WRONG IN SUPERBLK

Since I used the -n switch no repairs were attempted. Since the system will not boot up regularly, am I to assume the bad sectors are affecting my kernel or something that is needed to boot up? If I try to run fsck with -y switch, or try to run interactively to fix errors, will I trash the system and need to reinstall and restore data from backup?

I have done the restore before so I know how, but I also know it is risky due to the age of my backup tape and drive. If it is the only way I will have to try, but is there a chance that repairing the disk using fsck will fix everything? Thanks

Well, good news and bad news.

Good news....the damage is very very minor and you should be able to fix that with an interactive fsck run. There should only be a few questions (to answer 'y' to each) and it is very unlikely anything will go wrong with that, but if it does and you are getting many more questions than expected, you can break out of it and stop the process. You can't do that if you run it '-y' because everything will happen too fast.

Bad news....the damage is so minor that it could well NOT be the reason for it not booting. However, it might be.

Hi,

This looks to me like a head crash on the system partition, there are a couple of things that you could try - but I don't hold out much hope I'm afraid.

You could boot the system from a bootable floppy and use "divvy" to have a look at the disk and reassign the cylinder - might work.

My next option would be to replace the disk and rebuild.

Or you could virtualise the whole thing on something a bit more modern - under VMware perhaps?

Regards

Gull04

gull04 thinks that this might be serious so, from memory (and I haven't worked with SCO for some years) you could try the '-ofull' fsck switch along with the '-n'. If the filesystem is very large this will take quite a while to run but it may well tell you a lot more about any bad sectors (such as the inode number affected and therefore which file(s) are involved).

Probably worth a try. '-ofull' is sometimes an undocumented and rarely used option that tells it to check everything.

# fsck -n -ofull <filesystem>

I am willing to try the -ofull option with fsck, but today it's acting like this may be a hardware problem. Every time I have booted up with "unix.safe" and Single User mode, when I try to run the fsck command the system just reboots out of the blue. It is the same thing that happens if I try to boot into the regular unix or unix.old kernels, it just reboots almost immediately after trying to load.

Does this behavior sound like a disk problem, or could it be other hardware - memory, motherboard, etc.? Are there any diagnostic tools that I could try at the hardware level?

As for the suggestion to go virtual from gull04 - I would love to, and I have tried with VM Ware. I have a current VM that works, the only thing I need to do is try and get my USB tape drive to be recognized by SCO OpenServer installed on the VM so I can restore most of the /etc directory from my backup tape. So far I have been unsuccessful with getting the tape drive to work. That may be my next post...?

I'd be inclined to fully test RAM first. I would re-seat all SIMMs in their slots and/or download a bootable memory test iso and run that a few times.

Booting from CD/DVD directly into a shell doesn't normally involve a HDD to get the system on its feet.

Hi Spock,

I'd go with hicksd8, on the diagnostics although I'm still favouring a head crash on the OS partition.

As to the tape problem on VMware, depending on available HW you could always remote mount if you have an other bit of tin with the same USB I/F as the tape and copy to near line storage where you can copy the data in at your leisure.

Regards

Gull04

OK, this saga continues... I figured I would just start over with a brand-new hard drive and SCO OpenServer 5.0.6 installation from my CD, thinking I would then restore my tape backup and be on my way. The install went without a problem, and next I installed the 5.0.6a supplement CD for what I think is y2k fixes. That went fine, but then when I did a reboot after it rebuilt the kernel - BAM the same problem happened at boot up. It just stops shortly after loading the boot config, and then the machine restarts. Obviously now it is some other hardware problem, but I don't know what to try. I ran a boot CD with some diagnostic tools, and the memory test passed just fine. I think I might have problems finding replacement parts to start swapping out - this server hardware is at least 14 or 15 years old (which I know is probably my main problem).

I would love to find a way to get the data from my backup tape to the virtual machine I have working in VMWare, so I will pursue that route and will probably have some more questions. gull04 mentioned something about restoring to "near line storage" and then remote mounting, but I have no idea how to go about that process.

Wow! Hardware is that old.

Your reinstall issue has got me thinking. I've been involved with SCO since (seemingly) prehistoric times and I've forgotten more than I can remember.

I just feel that something in the back of my mind points to a BIOS setting especially the one that used to allow you to map video memory into RAM to increase speed. Can you just try toggling that? I can't even remember what the parameter was called back then; video shadowing and/or BIOS shadowing???? If it's on, turn it off, if it's off, turn it on. It's just something that I think I remember but could easily be wrong.

Well, now I have a bigger problem - or maybe the answer to what's going on all together. I had swapped back to the old hard drive, and straightened out the IDE cable and checked connections, just to see if that would help, and when I booted up it did the same thing. So I have another SATA/PATA converter card and I swapped, and this time the monitor came on with just a "white" screen, and now no matter what I try to get the server going, there is no video signal coming out of the onboard VGA connector. I think maybe that is what was going out, and now is completely out - would this be possible?

The Server board is Intel model SE7500CW2, and has a couple of empty PCI-X slots maybe I could try another video card - but would that even work with my SCO installation? I'm grasping at straws now.

------ Post updated at 03:16 PM ------

Update - spare video card plugged into PCI Express 133 slot 1 is working, but still not the total answer. Boot still bombs out after short try. I have tested memory, checked IDE cable and connections, and now I think I've ruled out video card issues. Any other suggestions? I don't know whether to mess with checking the CPUs or what else exactly I can check.

Hi Spock,

Bringing the data to some kind of near line storage is pretty simple, but you'll need to give us some information regarding the backup medium and method.

I see from the previous posts that the medium is tape, what kind and can you attach it anywhere else?

As to the method, can you tell us if it's tar or dump or cpio or something commercial?

Recovering to disk on an other server and the sharing the drive using NFS is an option, this can even be mounted into a VM or just copied to the VM if it can be recovered.

Regards

Gull04

Re my post#13.......Searching online I have just found documentation that says BIOS options for video shadowing and/or BIOS shadowing should be 'disabled'. Otherwise, memory will be over-written.

I seem to remember that this will screw the boot process.

@hicksd8 - yeah I searched the BIOS for a setting like that and I couldn't seem to find it. I will check again though.

@gull04 - The backups were done with HP SureStore Dat24 internal tape drive, using Microlite BackupEdge "Full Backups", and the tapes are Sony DGD126P type. Like I might have mentioned, our legacy software that runs on this machine is used for looking up historical data, and not an active server. When I was experimenting with the VM I had copied most everything over from the physical machine using ftp and cpio. The legacy app on the VM will not run though, I think because I did not copy over the entire /etc directory, and the reason is that when I got the restore to work in a previous hardware replacement I had to "exclude" the /etc.config file (I think that was the one). Anyway, when I had both the physical and virtual machines running I tried to compare the /etc directories to see which ones I needed, because using cpio on each one was too time consuming. Now I wish I had spent the time to try each directory, because I feel something is missing from there that my app needs to run correctly. I have purchased a newer USB HP SureStore DAT24 drive, but I have not had any luck "attaching" it to the VM so that the SCO installation will recognize it. I don't know what the options are as far as restoring the data - would it work to restore it to a Windows machine or VM? I have all kinds of resources here (thankfully) as far as Windows XP and Windows 7 working models. I also can probably purchase anything that I might need to get this job accomplished.

I can also try to replace more parts in the physical server, so far I think I have eliminated these as the problem: video card, monitor, hard drive, memory (tested OK), IDE cable, PATA/SATA adapter, power supply. Any suggestions are, as always, greatly appreciated.

Can you boot unix.old

I have had a chance to try a few more things, I was focused for a while on this error message that I noticed at some point when the problem first started:

Open event driver failed.... Fatal server error.... Check mouse configuration

I started to read up on mouse configurations, and probably just wasted a bunch of time because even though I can go into unix.safe and add/remove a mouse to reconfigure, whenever it builds the new kernel, environment, etc. I'm not really going to see those changes because I can never boot up in the "regular" unix kernel - right?

So here's what I now have: HD #1 with previously working O/S and legacy app. HD #2 with fresh O/S installation. I can easily power down and switch between the two hard drives for testing, and here is what I can do with each one:

HD #1 - I can boot into unix.safe in either maintenance mode or regular mode, but in regular mode I have no mouse control in the graphical interface. I cannot boot into unix.old or just a blank (unix) boot, whenever it reaches the "Loading kernel" part the machine does a restart.

HD #2 - I can boot into unix.safe in either maintenance mode or regular mode, but again no mouse control in the GI. I can boot into unix.old, but no help there. I cannot do a straight boot (unix), it does the same restart when the loading kernel process begins.

So my only thoughts now are something with the motherboard - maybe the primary IDE port or channel is malfunctioning, or it really has something to do with the PS/2 mouse port? The only thing that makes sense to me is something to do with the IDE channel, since the restart seems to be triggered whenever any kind of "load" is put on the HD. Does this even make sense?

boot using unix.safe.
run /etc/scologin disable
shutdown
boot using unix
if this boot fails then the mouse is not the problem.