Hi,
I notice below messages on few Linux clients
[root@rhel-client22 ~]# cat /var/log/messages | grep -i yp
Jan 24 13:40:42 rhel-client22-prod crond: YPBINDPROC_DOMAIN: Domain not bound
Jan 24 16:00:42 rhel-client22-prod crond: YPBINDPROC_DOMAIN: Domain not bound
[root@rhel-client22 ~]#
Here is output from debug screen where the error was printed --
79820: ping host 'wksp-dir1-prod', domain 'pre.abc.com'
79817: ypbindproc_domain_2_svc (pre.abc.com)
79817: Pinging all active servers.
79817: ping host 'wksp-dir1-prod', domain 'pre.abc.com'
79820: Pinging all active servers.
79820: ping host 'wksp-dir1-prod', domain 'pre.abc.com'
79817: trylock = failed
79817: trylock = success
79817: ping host 'wksp-dir1-prod', domain 'pre.abc.com'
79817: Status: YPBIND_FAIL_VAL
79820: Pinging all active servers.
79820: ping host 'wksp-dir1-prod', domain 'pre.abc.com'
79820: Answer for domain 'pre.abc.com' from server 'wksp-dir1-prod'
79820: Pinging all active servers.
79820: ping host 'wksp-dir1-prod', domain 'pre.abc.com'
79820: Pinging all active servers.
79820: ping host 'wksp-dir1-prod', domain 'pre.abc.com
Since this is coming on multiple clients, I checked the NIS master server. Though timestamp is not matching with timestamp of Mastr NIS, when issue occurs, but it may be related
Jan 24 03:11:23 wksp-dir1-prod ypbind[22535]: [ID 337329 daemon.error] NIS server not responding for domain "pre.abc.com"; still trying
Jan 24 03:11:23 wksp-dir1-prod ypbind[24552]: [ID 647655 daemon.error] NIS server for domain "pre.abc.com" OK
Jan 24 05:01:58 wksp-dir1-prod ypbind[1685]: [ID 337329 daemon.error] NIS server not responding for domain "pre.abc.com"; still trying
Jan 24 05:11:48 wksp-dir1-prod ypbind[13158]: [ID 337329 daemon.error] NIS server not responding for domain "pre.abc.com"; still trying
Jan 24 05:20:30 wksp-dir1-prod ypbind[24312]: [ID 647655 daemon.error] NIS server for domain "pre.abc.com" OK
Jan 24 15:26:45 wksp-dir1-prod ypbind[7667]: [ID 337329 daemon.error] NIS server not responding for domain "pre.abc.com"; still trying
Jan 24 21:20:52 wksp-dir1-prod ypbind[15666]: [ID 647655 daemon.error] NIS server for domain "pre.abc.com" OK
How should I debug it? This server itself is NIS server, why it would not reach itself? Any pointers, please?
Thanks
Check the NIS server if/where it binds
ypbind
Check /etc/nsswitch.conf:
it must have files
first, and files nis
only for a mimimum of items (where it really makes sense).
Check with
snoop rpc ypprog
if a NIS client is excessively querying.
Check with
uptime
if the NIS server is busy/loaded with something.
Check with
ps -ef | grep '\<[y]p'
if the yp
processes were restarted.
Thanks for your advice. I checked your asked commands. Can't find anything bad so far
bash-3.2# ypwhich
wksp-dir1-prod
bash-3.2# hostname
wksp-dir1-prod
bash-3.2# uptime
10:53pm up 4 day(s), 1:06, 2 users, load average: 0.17, 0.27, 0.35
bash-3.2# ps -ef | grep '\<[y]p'
root 24202 23554 1 Jan 20 ? 98:23 /usr/lib/netsvc/yp/ypserv -d
root 24195 23554 0 Jan 20 ? 0:00 /usr/lib/netsvc/yp/rpc.yppasswdd -D /var/yp/maps -m
root 24201 23554 0 Jan 20 ? 0:00 /usr/lib/netsvc/yp/ypxfrd
root 24184 23554 0 Jan 20 ? 0:00 /usr/lib/netsvc/yp/rpc.ypupdated
root 24231 23554 0 Jan 20 ? 0:25 /usr/lib/netsvc/yp/ypbind
bash-3.2#
bash-3.2# cat /etc/nsswitch.conf | grep -v "#"
passwd: files nis
group: files nis
hosts: files dns
ipnodes: files
networks: files
protocols: files
rpc: files
ethers: files
netmasks: files
bootparams: files
publickey: files
netgroup: files nis
automount: files
aliases: files
services: files
printers: user files
auth_attr: files
prof_attr: files
project: files nis
tnrhtp: files
tnrhdb: files
bash-3.2#
bash-3.2# svcs -a | grep yp
online Jan_20 svc:/system/cryptosvc:default
bash-3.2#
bash-3.2# svcs -a | grep -i rpc/bind
online Jan_20 svc:/network/rpc/bind:default
bash-3.2#
This is non global zone, so snoop can't run from bge1:1 here. So I ran it from its global server
snoop -d bge1 rpc ypprog
But gives me huge output because lot of servers are querying this Master and that NIS error doesn't appear frequently.
Yes, your figures are okay.
How many zones are there? Maybe another zone is temporarily using too many resources?
Total 4 zones. It is V240, old server with only 4GB ram. But utilization looks okay. Here is output from prstat -Z
ZONEID NPROC SWAP RSS MEMORY TIME CPU ZONE
0 69 348M 366M 8.9% 204:53:40 1.3% global
10 39 59M 74M 1.8% 1:58:48 0.7% wksp-dir1
1 31 738M 448M 11% 398:20:41 0.1% wksp-logger1
4 23 71M 80M 2.0% 4:22:11 0.0% wksp-ntp1
5 34 86M 78M 1.9% 1:46:46 0.0% wksp-webserv1
As of now, I am enabling sar on this server and will check in the morning if there was any spike when this error occurs.
Yes, that's a good idea.
Do you have a valid Oracle account?
Check for patches for the bge NIC (these are often not part of the recommended patch clusters).