Ypbind intermittent issue on Solaris 10

solaris_1977 · January 25, 2021, 6:10am

Hi,
I notice below messages on few Linux clients

[root@rhel-client22 ~]# cat /var/log/messages | grep -i yp
Jan 24 13:40:42 rhel-client22-prod crond: YPBINDPROC_DOMAIN: Domain not bound
Jan 24 16:00:42 rhel-client22-prod crond: YPBINDPROC_DOMAIN: Domain not bound
[root@rhel-client22 ~]#

Here is output from debug screen where the error was printed --

79820: ping host 'wksp-dir1-prod', domain 'pre.abc.com'
79817: ypbindproc_domain_2_svc (pre.abc.com)
79817: Pinging all active servers.
79817: ping host 'wksp-dir1-prod', domain 'pre.abc.com'
79820: Pinging all active servers.
79820: ping host 'wksp-dir1-prod', domain 'pre.abc.com'
79817: trylock = failed
79817: trylock = success
79817: ping host 'wksp-dir1-prod', domain 'pre.abc.com'
79817: Status: YPBIND_FAIL_VAL
79820: Pinging all active servers.
79820: ping host 'wksp-dir1-prod', domain 'pre.abc.com'
79820: Answer for domain 'pre.abc.com' from server 'wksp-dir1-prod'
79820: Pinging all active servers.
79820: ping host 'wksp-dir1-prod', domain 'pre.abc.com'
79820: Pinging all active servers.
79820: ping host 'wksp-dir1-prod', domain 'pre.abc.com

Since this is coming on multiple clients, I checked the NIS master server. Though timestamp is not matching with timestamp of Mastr NIS, when issue occurs, but it may be related

Jan 24 03:11:23 wksp-dir1-prod ypbind[22535]: [ID 337329 daemon.error] NIS server not responding for domain "pre.abc.com"; still trying
Jan 24 03:11:23 wksp-dir1-prod ypbind[24552]: [ID 647655 daemon.error] NIS server for domain "pre.abc.com" OK
Jan 24 05:01:58 wksp-dir1-prod ypbind[1685]: [ID 337329 daemon.error] NIS server not responding for domain "pre.abc.com"; still trying
Jan 24 05:11:48 wksp-dir1-prod ypbind[13158]: [ID 337329 daemon.error] NIS server not responding for domain "pre.abc.com"; still trying
Jan 24 05:20:30 wksp-dir1-prod ypbind[24312]: [ID 647655 daemon.error] NIS server for domain "pre.abc.com" OK
Jan 24 15:26:45 wksp-dir1-prod ypbind[7667]: [ID 337329 daemon.error] NIS server not responding for domain "pre.abc.com"; still trying
Jan 24 21:20:52 wksp-dir1-prod ypbind[15666]: [ID 647655 daemon.error] NIS server for domain "pre.abc.com" OK

How should I debug it? This server itself is NIS server, why it would not reach itself? Any pointers, please?

Thanks

MadeInGermany · January 25, 2021, 6:43am

Check the NIS server if/where it binds

ypbind

Check /etc/nsswitch.conf:
it must have files first, and files nis only for a mimimum of items (where it really makes sense).

Check with

snoop rpc ypprog

if a NIS client is excessively querying.

Check with

uptime

if the NIS server is busy/loaded with something.

Check with

ps -ef | grep '\<[y]p'

if the yp processes were restarted.

solaris_1977 · January 25, 2021, 7:06am

Thanks for your advice. I checked your asked commands. Can't find anything bad so far

bash-3.2# ypwhich
wksp-dir1-prod
bash-3.2# hostname
wksp-dir1-prod
bash-3.2# uptime
 10:53pm  up 4 day(s),  1:06,  2 users,  load average: 0.17, 0.27, 0.35
bash-3.2# ps -ef | grep '\<[y]p'
    root 24202 23554   1   Jan 20 ?          98:23 /usr/lib/netsvc/yp/ypserv -d
    root 24195 23554   0   Jan 20 ?           0:00 /usr/lib/netsvc/yp/rpc.yppasswdd -D /var/yp/maps -m
    root 24201 23554   0   Jan 20 ?           0:00 /usr/lib/netsvc/yp/ypxfrd
    root 24184 23554   0   Jan 20 ?           0:00 /usr/lib/netsvc/yp/rpc.ypupdated
    root 24231 23554   0   Jan 20 ?           0:25 /usr/lib/netsvc/yp/ypbind
bash-3.2#
bash-3.2# cat /etc/nsswitch.conf | grep -v "#"


passwd:     files nis
group:      files nis
hosts:      files dns
ipnodes:    files
networks:   files
protocols:  files
rpc:        files
ethers:     files
netmasks:   files
bootparams: files
publickey:  files
netgroup:   files nis
automount:  files
aliases:    files
services:   files
printers:       user files

auth_attr:  files
prof_attr:  files
project:    files nis

tnrhtp:     files
tnrhdb:     files
bash-3.2#
bash-3.2# svcs -a | grep yp
online         Jan_20   svc:/system/cryptosvc:default
bash-3.2#
bash-3.2# svcs -a | grep -i rpc/bind
online         Jan_20   svc:/network/rpc/bind:default
bash-3.2#

This is non global zone, so snoop can't run from bge1:1 here. So I ran it from its global server
snoop -d bge1 rpc ypprog

But gives me huge output because lot of servers are querying this Master and that NIS error doesn't appear frequently.

MadeInGermany · January 25, 2021, 7:17am

Yes, your figures are okay.
How many zones are there? Maybe another zone is temporarily using too many resources?

solaris_1977 · January 25, 2021, 7:29am

Total 4 zones. It is V240, old server with only 4GB ram. But utilization looks okay. Here is output from prstat -Z

ZONEID    NPROC  SWAP   RSS MEMORY      TIME  CPU ZONE
     0       69  348M  366M   8.9% 204:53:40 1.3% global
    10       39   59M   74M   1.8%   1:58:48 0.7% wksp-dir1
     1       31  738M  448M    11% 398:20:41 0.1% wksp-logger1
     4       23   71M   80M   2.0%   4:22:11 0.0% wksp-ntp1
     5       34   86M   78M   1.9%   1:46:46 0.0% wksp-webserv1

As of now, I am enabling sar on this server and will check in the morning if there was any spike when this error occurs.

MadeInGermany · January 25, 2021, 7:54am

Yes, that's a good idea.

Do you have a valid Oracle account?
Check for patches for the bge NIC (these are often not part of the recommended patch clusters).