4 Servers hang with same error message

I have 3 Power7 710 Express boxes and 1 Power7 750Express box that all get into a weird hung state with the same console message. they all have AIX 6.1 TL7 (2x6100-07-08-1339, 2x6100-07-10-1415).

"NIS: Server not responding for domain X.X.X; still trying."

There are other Clients that use the same NIS Server and they do not have any issues. I have tried to restart the NIS server (/etc/init.d/ypserv restart) while the AIX boxes are hung, but, to no avail.

The NIS server is a fedora server running at ~14.1 utilized (yes very high, I agree), but if that was issue, I would expect other clients would have
issues, not just my AIX boxes.

When problem is occurring, I can telnet to the AIX box and get a login prompt, but the login isn't processed (cpu over utilized??). The console is active, again, the OS cannot process the login request, it does not respond back with the password prompt sometimes,. the box does not respond to a ping.

The only cure is reboot the box from HMC, then they are ok. There is NO errpt messages about any problem, no network, no kernel, no disk, no memory logged.

They will run for a few days, 10-20 days, then all 4 go into the same state almost at same time.

Just need a direction to go in or script to collect system stats so when the system hangs I can look at the output after reboot, or an idea where to start looking. I have ran diag and checked network card, sysplanar0, sisass0 and memory and all pass.

it didn't hang, it is wating for NIS server :wink: First thing you can do is to remove NIS configuration from the server: rmyp -c . I would personally recommend to stop on this point, because you will never ever have any problems with NIS if you don't have it.

But I suppose, that you want to have your NIS configuration back. After NIS removal I'd try to reboot the server again and to see, that everything goes well and there are no other errors, such as hardware or network errors.

If the server starts without NIS, try to ping the NIS server, then check DNS and/or /etc/hosts.

# host NIS-server
# host IP-address-of-my-NIS-server
# host -n NIS-server
# host -n IP-address-of-my-NIS-server

If you have several DNS servers in your /etc/resolv.conf, try all of them:

# host -n NIS-server DNS-server-1
# host -n NIS-server DNS-server-2

If it is ok, you can try to configure NIS client again:

# domainname NIS-domain
# mkclient -B

In this case AIX will broadcast to find a suitable server. You can try to specify NIS server:

# mkclient -B -S NIS-server

But I would first try without the server - AIX should be able to find the server, and if it can't you have some problem with your NIS configuration. (But don't ask me which problem - I configured a NIS server last time in 1998).

Then check, that ypbind started:

# lssrc -s ypbind

If it is started, check which NIS server do you use:

# ypwhich

If everything is ok at this point, you can check, that you see your users (with their passwords of course):

# ypcat -k passwd

As a last step you can call me and we will discuss your future migration to LDAP, because NIS+ is not a part of AIX 7.2 anymore.