System Hanging on boot

Hello:

Hope someone can help. Yesterday we did a mksysb backup of one of our AIX boxes and now the server is hanging at "Multi-User initialization completed" prompt.

Can someone help in troubleshooting this error? If you need more info please just let me know what you need. I'm not that familiar with AIX. I can tell you that it's v5.2

A long time since I booted and maintained a aix box...
I would see since it booted and got to multi user level, some corruption or something added since last reboot in /etc/inittab:
Boot in single user and then you could try to edit the file and comment out all lines after single user and try line by line after.
Isnt there a /etc/rc.log file?

I looked in the inittab file for the lines relating to "Single User Mode" but I couldn't see it. Can you be a little more specific as to where it is? Also, there is not /etc/rc.log file. At least not on this particular box.

Anyone else can chime in?

I found an article online and talked about adding ',clocal' to the end of a couple tty lines. I did this and rebooted, but still the same problem. I would really like to get this fixed.

Thanks.

There is one thing that puzzles me:

I dont see what link there is between the mksysb generation and the message which is displayed where?
If on console, what do you see? can you log in? if so what makes you think it is hung?
It could be network services not working (hardware issue?)
type errpt -a|more and see if there are information on what is going on

Hi,

The mksysb creation failed with an error of suggesting that the process ran out of memory.

I'm using the console and no I cannot log in. It never gets to the point to log in. The boot stops at "Multi-User initialization completed". We left it there over night and came back the next morning and it was still there. I can boot into single user mode and such.

As far as the Error Log, this is the only real error I saw:

---------------------------------------------------------------------------
LABEL: JFS_FS_FRAGMENTED
IDENTIFIER: 5DFED6F1

Date/Time: Fri Nov 14 14:00:09 EST 2008
Sequence Number: 2147
Machine Id: 00CDE13F4C00
Node Id: 00CDE13F4C00
Class: O
Type: INFO
Resource Name: SYSPFS

Description
UNABLE TO ALLOCATE SPACE IN FILE SYSTEM

Probable Causes
FILE SYSTEM FREE SPACE FRAGMENTED

Recommended Actions
CONSOLIDATE FREE SPACE USING DEFRAGFS UTILITY

Detail Data
MAJOR/MINOR DEVICE NUMBER
000A 0006
FILE SYSTEM DEVICE AND MOUNT POINT
/dev/hd9var, /var
---------------------------------------------------------------------------

Ha! This is quite simple to solve: your "/var" filesystem (/dev/hd9var) is full and AIX systems dislike this state of affairs that much they usually refuse to boot.

Boot in single-user mode (or even in service mode), mount the rootvg and make some space available in /var. This should do the trick.

I hope this helps.

bakunin

I hope you're right.

I check the /var and it was only at 69%. There is a file there wmtp that was 440MB. I compressed that file and that dropped the disk usage to 40%. I'm rebooting now, hopefully that would have enough to boot properly.

Nah ... didn't work. Still stops at "Multi-User initialization completed"

Hm.... you did look up what this file is for before compressing it, did'nt you?:rolleyes:

[Basic AIX administration on]
If you don't want to keep the information who logged when into your server delete that (i.e. this very) compressed wtmp file. Next time you want to reduce its size use the following commands:

# /usr/lib/acct/fwtmp < /var/adm/wtmp > /tmp/wtmp.asc
# tail -1000 /tmp/wtmp.asc > /tmp/wtmp.1000
# /usr/lib/acct/fwtmp -ci < /tmp/wtmp.1000 >/var/adm/wtmp 

This works online and wtmp history should go back about few days then only (depends on how many logins per day took place). If you want to blank it use

# > /var/adm/wtmp

But better not compress it while it is in use. (And don't delete a wtmp file that is in use neither!)
[Basic AIX administration off]

Do you happen to have a serial console connected to your server? In that case you might suffer from a wrong cable, wrong ASCII terminal or wrong tty settings. Reason then could be that an open to the /dev/tty via HW handshake is not satisfied such causing the hang when you reach Multi-User initialization completed.

No console cable connected. I did a mksysb backup, it failed with a "Fork ran out of memory" error. And now it boots and stops at the error I stated before. How could the ASCII terminal or tty setting be wrong.

Yes, that was probably clear when you wrote "I'm using the console" .... sorry, silly me. :smiley:
Deactivate the start of rc.tcpip in the inittab and start over.

Sorry if I'm using incorrect terms. I'm new to this IBM AIX stuff.

To deactive the rc.tcpip, you mean to just comment it out?

Yes. If the server boots ok, the problem is in the rc.tcpip. If it hangs again, comment out rc.nfs and so on. This way you will find out at what operation the problem occurs - hopefully

I edited the file and commented out rc.tcpip and rc.nfs and still hanging at the same place. I'm tired.

Hm ... What you could try next (i.e. tomorrow :wink: ):

  • boot into single user mode and look into errpt and bootlog (# alog -ot boot)
  • boot into single user mode and run diagnostics (# smitty diag) to find a potentially broken device,
  • boot into single user mode without mounting a rootvg and run a full fsck on every FS...

Did all those things and it's still stopping at "Multi-user initialization completed". I think this must be something simple.

Hi,

I was wondering how you solved this problem? I have similar issue and read through this thread. All the suggestions given here are very useful.
Could you also inform if the problem was solved and how? Thanks!!