I have 3 Power7 710 Express boxes and 1 Power7 750Express box that all get into a weird hung state with the same console message. they all have AIX 6.1 TL7 (2x6100-07-08-1339, 2x6100-07-10-1415).
"NIS: Server not responding for domain X.X.X; still trying."
There are other Clients that use the same NIS Server and they do not have any issues. I have tried to restart the NIS server (/etc/init.d/ypserv restart) while the AIX boxes are hung, but, to no avail.
The NIS server is a fedora server running at ~14.1 utilized (yes very high, I agree), but if that was issue, I would expect other clients would have
issues, not just my AIX boxes.
When problem is occurring, I can telnet to the AIX box and get a login prompt, but the login isn't processed (cpu over utilized??). The console is active, again, the OS cannot process the login request, it does not respond back with the password prompt sometimes,. the box does not respond to a ping.
The only cure is reboot the box from HMC, then they are ok. There is NO errpt messages about any problem, no network, no kernel, no disk, no memory logged.
They will run for a few days, 10-20 days, then all 4 go into the same state almost at same time.
Just need a direction to go in or script to collect system stats so when the system hangs I can look at the output after reboot, or an idea where to start looking. I have ran diag and checked network card, sysplanar0, sisass0 and memory and all pass.