SCO UNIX Won't Boot

jedimaster · October 9, 2009, 1:36pm

Our system is not booting up properly. It keeps going to this screen:

Enter Run Level (0-6, s or S):

I tried to hit all nos# 0-6 is just goes to hung state.
Tried s or S & it brings me to single user mode. I've checked the file systems & found out that all three had 98%. I tried to delete as many files as I could (data only) & managed to get it down to 94%, all 3 of them. When I rebooted the system it went back up to 98% again (all 3). I tried to look for wayward processes that I could kill or trace, could not find anything at all.

I need help very badly as the system is a production one & the only backup's I have are the data only. No full system backups available.

DukeNuke2 · October 9, 2009, 2:11pm

To keep the forums high quality for all users, please take the time to format your posts correctly.

First of all, use Code Tags when you post any code or data samples so others can easily read your code. You can easily do this by highlighting your code and then clicking on the # in the editing menu. (You can also type code tags

```text
 and 
```

by hand.)

Second, avoid adding color or different fonts and font size to your posts. Selective use of color to highlight a single word or phrase can be useful at times, but using color, in general, makes the forums harder to read, especially bright colors like red.

Third, be careful when you cut-and-paste, edit any odd characters and make sure all links are working property.

Thank You.

The UNIX and Linux Forums

TonyLawrence · October 9, 2009, 2:56pm

See my Out of DiskSpace? article for tips on where to look.

Shame on you for not having system backups. Microlite Edge (http://microlite.com) is inexpensive (yes, I'm a reseller) and HIGHLY recommended for SCO systems.

jgt · October 9, 2009, 3:16pm

The critical file system is the root file system. If it is at 100% the system will not boot.
You will have to boot from a diskette, mount the root file system and begin deleting/truncating files until you have 10 megabytes of free space.
Look at these files.

/usr/adm/messages
/usr/adm/syslog
/usr/spool/lp/logs/requests
/etc/wtmp

And these directories.

/usr/spool/mail
/tmp
/usr/spool/lp/temp

After you have got the system to boot.

As root do:
#cd /
#du -a |sort -r -n >/tmp/du.srt

Then examine /tmp/du.srt, as it will contain a list of all files/directories in descending order by size.
This will allow you to find very large files, and also directories that contain a large number of very small files.

---------- Post updated at 03:16 PM ---------- Previous update was at 03:02 PM ----------

Anticipating your next post, that you don't have an emergency boot diskette.
Go out this afternoon and buy a new hard drive. Temporarily remove the old drive, and re-install the operating system on the new drive.
If the original drive is SCSI, change the ID and set the read only jumper, (if IDE set the old drive as slave, or attach to the secondary IDE channel), install the drive and go through mkdev hd, being sure not to mark any of the existing file systems as new.
Copy the required files from the old drive to the new one.

You can use Norton Ghost to duplicate the original drive, although you cannot change any partition sizes.

Have a nice weekend.

jedimaster · October 9, 2009, 3:44pm

I am doing a ghost image on the device while waiting for some replies to this post as I have no emergency boot diskette to use. Right now I hear a ticking sound on the source drive......looks like it is about to fail, although ghosting it seems o be going through....hopefully.

With regards to the suggested directories to check on, I did have a look at those & deleted files that are not needed.

Thanks for the tips. Will try to do the part after reboot....that is after I have successfully created an image for it.

jgt · October 9, 2009, 6:51pm

If you manage to get the old disk to boot, you could download Microlite Edge, see Tony's post for where from, install it, create the recovery diskettes, do a full backup, replace the disk with a new larger hard drive, boot from the recovery diskettes, and in the process of doing the restore create new larger partitions.
This will save re-configuring all the users, printers, network settings, application software etc.
You can run Microlite Edge for at least 30 days as a fully functional eval.

jedimaster · October 9, 2009, 7:39pm

Sorry for not having any backups Tony. I just joined the company on the 5th & inherited a legacy system that no one seemed to care about, backups included. Now they are giving it a priority since it failed & most applications are mapped to the unix system I am trying to recover. Will try the evaluation software as well. Hopefully this will all be fixed.

edfair · October 9, 2009, 11:12pm

My experience is limited at the top to 5.0.5 and in those cases where my systems have hit the wall I've received a message telling me so. Nothing in your post indicates that you've hit 100% so I suspect that there are other issues rather than full file systems.

In single user you can disable any filesystems that normally attach going into multi and see if you can get into multi without them. that allows you to manually attach them to trace which ones are giving the problems. And in single you can get rid of the stuff that jgt mentioned, along with any core dumps.

You didn't mention printers. You might want to disable any that are used for the duration of troubleshooting. A printer job running wild could be one reason for the sudden buildup.

jedimaster · October 13, 2009, 2:33pm

Ghost image doesn't seem to work. System still at 98%. Printer/spooler is not enabled. I can only get the system to single user mode.....

---------- Post updated at 01:33 PM ---------- Previous update was at 01:29 PM ----------

I've zeroed out messages as well. I can't seem to find the crash/dump files for SCO. Can't find the directory as to where it is as well.

jgt · October 13, 2009, 6:29pm

Once you get the system to single user mode,
what is the output of:
#df -v
#dfspace
#uname -X

What messages, do you get when you try to go to multi user?

edfair · October 14, 2009, 7:34am

I would also suggest that you document some stuff:
Look in /dev for items hd* to see what you have for hard drives defined. Should be HD10 through HD1a and a similar set for 2nd and higher drives.
Run divvy on one section of every hard drive you have identified and document the information.
This information may be vital if you end up having to reinstall.

Once in single user you can run mkdev fs to remove other than the required filesystems of root, boot, and swap. They can be added back later, after you have resolved the multiuser issues.

Although JGT suggested that you try for 10mb free I've worked with systems with as little as 1K free while resolving problems. But I was prepared to floppy boot to get access to clean, if needed.

From single user does mkdev fd work? Would give some assurance that you could get back in if something goes "bump".

Don't understand the boot menuing but would be curious about using init from SU mode. Something like init 6 which should lead to multi IIRC.

jedimaster · October 15, 2009, 10:05am

jgt here are the outputs I got from what you asked.

#df -v
Mount Dir            Filesystem               Blocks                Used                 Free               %Used
/                       /dev/root                3373036             3298658            74378              98%  
/stand               /dev/boot                3373036             3298658            74378              98%
/u1                   /dev/u1                   3373036             3298658            74378              98%

#dfspace
dfspace:  not found

#uname -X
System = SCO_SV
Node = xxxxx 
Release = 3.2v5.0.5
KernelID = 98/07/02
Machine = Pentium II (D)
BusType = ISA
Serial = 2GA023210
Users = 2-User
OEM# = 0
Origin# = 1
Num Cpu = 2

Hope this info works. With regards to going to multi user, all I get is a hard hung state.

jgt · October 15, 2009, 2:15pm

You have the desktop version of Openserver. This allows two concurrent telnet users and one at the console.
The divvy table appears to be corrupted.
"df -v" reports all three file systems as the same size!!! (or did you do a typo)
Typically /root would be 1 to 3 gb
/stand about 15 or 20 megabytes
/u1 the balance of the disk.
/swap doesn't show, but is usually 2 * system memory.
Is there only one hard disk in the system?
In single user mode only /root and /stand should show in df -v.
Dfspace provides the same data as df -v except in megabytes.
You should have found dfspace in /etc, can you check your PATH.
It should include at least /etc /bin /tcb/bin /usr/bin

Run the following:
#divvy
The table should show the starting and ending 1k_block numbers for each file system.
Are there more file systems than reported by df -v?
Then run
#mount
to display all currently mounted files systems, and then
#mountall
This should tell you if there are additional file systems and their condition.
If any will not mount, run
#fsck /dev/??? (file system name)
then try mountall again.

Run both of the following:
#/tcb/bin/integrity -e , this will list all files that do not have the correct ownership and permissions. Change them with chmod, chgrp, chown as necessary.
Repeat this step until there is no output.
#/tcb/bin/authck -a , this will tell you if any system files associated with security are corrupted.

Quite often /etc/auth/system/ttys becomes corrupted if the system crashes or runs out of disk space, as this file is modified when users log on and off.
The file should look like:

console:t_devname=console:chkent:
tty01:t_devname=tty01:t_uid=root:t_logtime#1251921523:\
:t_unsucuid=root:t_unsuctime#1250890220:t_prevuid=root:t_prevtime#125192
2283:\
:chkent:
tty02:t_devname=tty02:t_uid=root:t_logtime#1201031571:\
:t_prevuid=root:t_prevtime#1201033323:chkent:
tty03:t_devname=tty03:t_uid=root:t_logtime#1230657866:\
:t_unsucuid=root:t_unsuctime#1230657857:t_prevuid=root:t_prevtime#123065
7926:\
:chkent:
tty04:t_devname=tty04:chkent:
tty05:t_devname=tty05:chkent:
tty06:t_devname=tty06:chkent:

You can make all lines look like tty04 through tty06, delete any extraneous data at the end of the file. tty01, 02, and 03 show current and the last logged in user at that tty. The number of lines in the file will depend on how many pseudo terminals are configured, but ttyp0 through 16 should be lots.

jedimaster · October 15, 2009, 3:34pm

i have two drives installed, mirrored. just sifting some stuff around the office (serial cable to capture the outputs that you requested.

no typo though. i just copied what i see on the screen.

jgt · October 15, 2009, 3:35pm

Jedi, what timezone are you in?

jedimaster · October 15, 2009, 6:33pm

Pacific

edfair · October 16, 2009, 3:48pm

df -kv as an alternative

divvy /dev/root will show the hard drive divisions starting and ending blocks. Might be well to know whether they are corrupted.

---------- Post updated at 03:48 PM ---------- Previous update was at 04:22 AM ----------

My only acessable installation, network on a SCSI using adaptec 2940 vanilla install, shows the following for divvy /dev/root

boot eafs 0 0 14999
swap nonfs 1 15000 62999
root htfs 2 63000 1032181
not used3
not used4
not used5
recover nonfs 6 1032182 1032191
whole disk 7 0 1032191

Yours probably would show the /u in division 3 with modifications to starting and ending blocks to fit it in and much larger divisions to fill your larger disk.

This fyi in case you haven't seen one.

jedimaster · October 20, 2009, 9:44am

After so many days trying to recover, I was left with no other choice but to rebuild a new kernel. I took steps to copy the existing kernel as well prior to rebuild.

After the rebuild, I was able to get a multi/gui screen logins. Checked the filesystem & noticed that I was missing one which was /u1. This contains the applications & data.

Checked the following entries on /etc/mnttab

# more mnttab
/dev/root/@A/dev/boot/standAA/dev/u1/u1A\306\334j

Added another entry for /dev/u1 to the end(dunno the used partition for /u1)...

/dev/u1/u1AA\306\334j

Edited entry as well for /etc/default/filesys as well:

bdev=/dev/boot cdev=/dev/rboot \
mountdir=/stand mount=no fstyp=EAFS \
fsck=no fsckflags= rcmount=yes \
rcfsck=no mountflags=

bdev=/dev/root cdev=/dev/rroot \
mountdir=/ mount=no fstyp=HTFS \
fsck=no fsckflags= rcmount=no \
rcfsck=no mountflags=

bdev=/dev/u1 cdev=/dev/ru1 \
mountdir=/u1 mount=yes fstyp=HTFS \
fsck=no fsckflags= rcmount=yes \
rcfsck=no mountflags=
rcmount=yes rcfsck=dirty

# mountall
fsstat: /dev/boot mounted
Mounted /stand filesystem
fsstat: cannot stat filesystem /dev/u1
mount: /dev/u1: No such device or address (error 6)
Failed to mount /u1 filesystem
mnt: no block device for entry number 4 in /etc/default/filesys

Is there anyway to recover the unmounted partition if ever recoverable? appreciate any inputs. Thanks.

edfair · October 20, 2009, 12:45pm

What shows in divvy for the drive ?
Every potential division should have starting and ending blocks and a filesystem type.
If the division name is blank or a default name then you change the name to u1 and write it out. That creates the name in /dev. Or you can use the default name if it exists.

Once you have the name in /dev you can run "mkdev fs" and link the hardware device in /dev to the mount directory on /.

Since you did some manual modifications you may get errors when you try to attach. I would undo the manual modifications and let mkdev handle it all.

jedimaster · October 20, 2009, 1:55pm

bash-2.03# divvy
+-------------------+------------+--------+---+-------------+------------+
| Name | Type | New FS | # | First Block | Last Block |
+-------------------+------------+--------+---+-------------+------------+
| boot | EAFS | no | 0 | 0| 15359|
| swap | NON FS | no | 1 | 15360| 406527|
| root | HTFS | no | 2 | 406528| 2093045|
| | NOT USED | no | 3 | -| -|
| | NOT USED | no | 4 | -| -|
| | NOT USED | no | 5 | -| -|
| recover | NON FS | no | 6 | 2093046| 2093055|
| hd0a | WHOLE DISK | no | 7 | 0| 2095087|
+-------------------+------------+--------+---+-------------+------------+
2093056 1K blocks for divisions, 2032 1K blocks reserved for the system

Hope this helps.