Solaris LDAP client not starting on Solaris 10 server

Hello,
I have Solaris-10 running on x86 HP machine. Server went into hung state and when I reboot it, its ldap client service is not coming up. I tried clearing it, but still no luck.

# svcs -a | grep ldap
maintenance    16:58:02 svc:/network/ldap/client:default
# svcadm clear svc:/network/ldap/client:default
# cd /var/svc/log
# tail -f network-ldap-client:default.log
[ Jul  6 16:05:09 Method or service exit timed out.  Killing contract 1369 ]
[ Jul  6 16:05:09 Method "start" failed due to signal KILL ]
[ Jul  6 16:56:02 Leaving maintenance because clear requested. ]
[ Jul  6 16:56:02 Enabled. ]
[ Jul  6 16:56:02 Executing start method ("/lib/svc/method/ldap-client start") ]
[ Jul  6 16:58:02 Method or service exit timed out.  Killing contract 1396 ]
[ Jul  6 16:58:02 Method "start" failed due to signal KILL ]
[ Jul  6 17:07:24 Leaving maintenance because clear requested. ]
[ Jul  6 17:07:24 Enabled. ]
[ Jul  6 17:07:24 Executing start method ("/lib/svc/method/ldap-client start") ]
[ Jul  6 17:09:24 Method or service exit timed out.  Killing contract 1412 ]
[ Jul  6 17:09:24 Method "start" failed due to signal KILL ]
^C
# svcs -a | grep ldap
maintenance    17:09:24 svc:/network/ldap/client:default
#
# /usr/lib/ldap/ldap_cachemgr -g
/usr/lib/ldap/ldap_cachemgr doesn't appear to be running.
# cat /var/ldap/ldap_client_file
#
# Do not edit this file manually; your changes will be lost.Please use ldapclient (1M) instead.
#
NS_LDAP_FILE_VERSION= 2.0
NS_LDAP_SEARCH_BASEDN= dc=pre,dc=doper,dc=com
NS_LDAP_AUTH= tls:simple
NS_LDAP_SEARCH_REF= TRUE
NS_LDAP_SEARCH_SCOPE= one
NS_LDAP_SEARCH_TIME= 30
NS_LDAP_SERVER_PREF= ngpre-wst-wks1.doper.com, ngpre-est-wks1.doper.com
NS_LDAP_PROFILE= ldap-client-30
NS_LDAP_CREDENTIAL_LEVEL= proxy
NS_LDAP_SERVICE_SEARCH_DESC= group:ou=Group,?one?
NS_LDAP_SERVICE_SEARCH_DESC= shadow:ou=People,?one?
NS_LDAP_SERVICE_SEARCH_DESC= netgroup:ou=netgroup,?one?
NS_LDAP_SERVICE_SEARCH_DESC= sudoers:ou=sudoers,?one?
NS_LDAP_SERVICE_SEARCH_DESC= user_attr:ou=People,?one?
NS_LDAP_SERVICE_SEARCH_DESC= passwd:ou=People,?one?isMemberOf=cn=ldap-client-30,ou=hosts,dc=pre,dc=doper,dc=com
NS_LDAP_BIND_TIME= 10
#

Please give me some pointer, what else I can check to fix this.
Thanks

Can you reach the two servers?
With ping ?
With telnet servername ldap ?

Look at the start script
/lib/svc/method/ldap-client
Is there a special log or output file? Then look at it.
Run its shell (given at the beginning in the "shebang") with -x and the start argument e.g.

/bin/ksh -x lib/svc/method/ldap-client start

I can ping and telnet on both ldap ports, these works fine.

There are 2 LDAP servers. But due to some old issue, only one LDAP connects since long time, when I am checking logs on various clients
When I ran command with -x , it stuck indefinitely at ldap_cachemgr

# /bin/ksh -x /lib/svc/method/ldap-client start
+ . /lib/svc/share/smf_include.sh
+ SMF_EXIT_OK=0
+ SMF_EXIT_ERR_FATAL=95
+ SMF_EXIT_ERR_CONFIG=96
+ SMF_EXIT_MON_DEGRADE=97
+ SMF_EXIT_MON_OFFLINE=98
+ SMF_EXIT_ERR_NOSMF=99
+ SMF_EXIT_ERR_PERM=100
+ [ ! -r /var/ldap/ldap_client_file ]
+ exec /usr/lib/ldap/ldap_cachemgr

At same time I ran /usr/lib/ldap/ldap_cachemgr -g and see below status -

# /usr/lib/ldap/ldap_cachemgr -g

cachemgr configuration:
server debug level          0
server log file "/var/ldap/cachemgr.log"
number of calls to ldapcachemgr          1

cachemgr cache data statistics:
Configuration refresh information:
  Previous refresh time: 2022/07/06 22:42:31
  Next refresh time:     2022/07/07 10:42:31
Server information:
  Previous refresh time: 2022/07/06 23:07:48
  Next refresh time:     2022/07/06 23:07:49
  server: ngpre-wst-wks1.doper.com, UNKNOWN, status: ERROR
  server: ngpre-est-wks1.doper.com, UNKNOWN, status: ERROR
Cache data information:
  Maximum cache entries:          256
  Number of cache entries:          0
#

If I compare this with a working client, then here is working client

# /usr/lib/ldap/ldap_cachemgr -g

cachemgr configuration:
server debug level          6
server log file "/var/ldap/cachemgr.log"
number of calls to ldapcachemgr        137

cachemgr cache data statistics:
Configuration refresh information:
  Previous refresh time: 2022/07/06 22:39:20
  Next refresh time:     2022/07/07 10:39:20
Server information:
  Previous refresh time: 2022/07/06 23:09:20
  Next refresh time:     2022/07/06 23:14:20
  server: ngpre-wst-wks1.doper.com, ODSEE, status: UP
  server: ngpre-est-wks1.doper.com, UNKNOWN, status: ERROR
Cache data information:
  Maximum cache entries:          256
  Number of cache entries:          0
#

Is it giving any indication, which I can check further?

Something in the
/var/ldap/cachemgr.log
?

Read the man page
man ldap_cachemgr
and check/compare the files listed at the end (Files section)

If the log does not report a problem and the files look okay then you can try uninit and init, as suggested in

I noticed the NS_LDAP_SERVERS line is missing. Was it recently edited?
No issues with DNS ?


Note: It is Solaris LDAP client by the way, not the OpenLDAP client. Changing the title.

I don't see NS_LDAP_SERVERS in working servers too

# ldapclient list
NS_LDAP_FILE_VERSION= 2.0
NS_LDAP_BINDDN= cn=somedn,ou=hosts,dc=pre,dc=doper,dc=com
NS_LDAP_BINDPASSWD= {NS1}blablaba
NS_LDAP_SEARCH_BASEDN= dc=pre,dc=doper,dc=com
NS_LDAP_AUTH= tls:simple
NS_LDAP_SEARCH_REF= TRUE
NS_LDAP_SEARCH_SCOPE= one
NS_LDAP_SEARCH_TIME= 30
NS_LDAP_SERVER_PREF= ngpre-wst-wks1.doper.com, ngpre-est-wks1.doper.com
NS_LDAP_PROFILE= ngpre-wst-zonemgr1
NS_LDAP_CREDENTIAL_LEVEL= proxy
NS_LDAP_SERVICE_SEARCH_DESC= group:ou=Group,?one?
NS_LDAP_SERVICE_SEARCH_DESC= shadow:ou=People,?one?
NS_LDAP_SERVICE_SEARCH_DESC= netgroup:ou=netgroup,?one?
NS_LDAP_SERVICE_SEARCH_DESC= sudoers:ou=sudoers,?one?
NS_LDAP_SERVICE_SEARCH_DESC= user_attr:ou=People,?one?
NS_LDAP_SERVICE_SEARCH_DESC= passwd:ou=People,?one?isMemberOf=cn=ngpre-wst-zonemgr1,ou=hosts,dc=pre,dc=doper,dc=com
NS_LDAP_BIND_TIME= 10
#

I tried uninit and init ldap, but this is not helping yet

#  ldapclient uninit
Stopping autofs failed with (1). You may need to restart it manually for changes to take effect.

Stopping ldap failed with (7)
Errors stopping network services.
#
# /usr/sbin/ldapclient -v init -a proxyDN=cn=`hostname`,ou=hosts,dc=pre,dc=doper,dc=com -y /etc/ldap.secret -a domainName=pre.doper.com -a profileName=`hostname` ngpre-wst-wks1
Parsing proxyDN=cn=ngpre-wst-zonemgr2,ou=hosts,dc=pre,dc=doper,dc=com
Parsing domainName=pre.doper.com
Parsing profileName=ngpre-wst-zonemgr2
Arguments parsed:
        domainName: pre.doper.com
        proxyDN: cn=ngpre-wst-zonemgr2,ou=hosts,dc=pre,dc=doper,dc=com
        profileName: ngpre-wst-zonemgr2
        proxyPassword: somepw
        defaultServerList: ngpre-wst-wks1
Handling init option
About to configure machine by downloading a profile
Proxy DN: cn=somedn2,ou=hosts,dc=pre,dc=doper,dc=com
Proxy password: {NS1}blablaba2
Credential level: 1
Authentication method: 3
Shadow Update is not enabled, no adminDN/adminPassword is required.
About to modify this machines configuration by writing the files
Stopping network services
sendmail not running
nscd not running
Stopping autofs
stop: system/filesystem/autofs:default... failed: entity not found
Stopping autofs failed with (1). You may need to restart it manually for changes to take effect.
ldap not running
nisd not running
nis(yp) not running
Removing existing restore directory
file_backup: stat(/etc/nsswitch.conf)=0
file_backup: (/etc/nsswitch.conf -> /var/ldap/restore/nsswitch.conf)
file_backup: stat(/etc/defaultdomain)=0
file_backup: (/etc/defaultdomain -> /var/ldap/restore/defaultdomain)
file_backup: stat(/var/nis/NIS_COLD_START)=-1
file_backup: No /var/nis/NIS_COLD_START file.
file_backup: nis domain is "pre.doper.com"
file_backup: stat(/var/yp/binding/pre.doper.com)=-1
file_backup: No /var/yp/binding/pre.doper.com directory.
file_backup: stat(/var/ldap/ldap_client_file)=0
file_backup: (/var/ldap/ldap_client_file -> /var/ldap/restore/ldap_client_file)
file_backup: (/var/ldap/ldap_client_cred -> /var/ldap/restore/ldap_client_cred)
Starting network services
start: /usr/bin/domainname pre.doper.com... success
start: sleep 100000 microseconds
start: sleep 200000 microseconds
start: sleep 400000 microseconds
start: sleep 800000 microseconds
start: sleep 1600000 microseconds
start: sleep 3200000 microseconds
start: sleep 6400000 microseconds
start: sleep 12800000 microseconds
start: sleep 25600000 microseconds
start: sleep 51200000 microseconds
start: sleep 17700000 microseconds
start: network/ldap/client:default... timed out
start: network/ldap/client:default... offline to disable
stop: sleep 100000 microseconds
stop: network/ldap/client:default... success
restart: sleep 100000 microseconds
restart: milestone/name-services:default... success
Error resetting system.
Recovering old system settings.
Stopping network services
sendmail not running
nscd not running
Stopping autofs
stop: system/filesystem/autofs:default... failed: entity not found
Stopping autofs failed with (1). You may need to restart it manually for changes to take effect.
ldap not running
nisd not running
nis(yp) not running
recover: stat(/var/ldap/restore/defaultdomain)=0
recover: open(/var/ldap/restore/defaultdomain)
recover: read(/var/ldap/restore/defaultdomain)
recover: old domainname "pre.doper.com"
recover: stat(/var/ldap/restore/ldap_client_file)=0
recover: file_move(/var/ldap/restore/ldap_client_file, /var/ldap/ldap_client_file)=0
recover: stat(/var/ldap/restore/ldap_client_cred)=0
recover: file_move(/var/ldap/restore/ldap_client_cred, /var/ldap/ldap_client_cred)=0
recover: stat(/var/ldap/restore/NIS_COLD_START)=-1
recover: stat(/var/ldap/restore/pre.doper.com)=-1
recover: stat(/var/ldap/restore/nsswitch.conf)=0
recover: file_move(/var/ldap/restore/nsswitch.conf, /etc/nsswitch.conf)=0
recover: stat(/var/ldap/restore/defaultdomain)=0
recover: file_move(/var/ldap/restore/defaultdomain, /etc/defaultdomain)=0
Starting network services
start: /usr/bin/domainname pre.doper.com... success
restart: sleep 100000 microseconds
restart: milestone/name-services:default... success
#
# svcadm enable svc:/network/ldap/client:default
# svcs -a| grep ldap
maintenance    14:46:46 svc:/network/ldap/client:default
#
# tail -f network-ldap-client:default.log
[ Jul  8 14:40:45 Leaving maintenance because disable requested. ]
[ Jul  8 14:40:45 Disabled. ]
[ Jul  8 14:42:15 Enabled. ]
[ Jul  8 14:42:15 Executing start method ("/lib/svc/method/ldap-client start") ]
[ Jul  8 14:44:15 Method or service exit timed out.  Killing contract 2833 ]
[ Jul  8 14:44:15 Method "start" failed due to signal KILL ]
[ Jul  8 14:44:15 Leaving maintenance because disable requested. ]
[ Jul  8 14:44:15 Disabled. ]
[ Jul  8 14:44:46 Enabled. ]
[ Jul  8 14:44:46 Executing start method ("/lib/svc/method/ldap-client start") ]


[ Jul  8 14:46:46 Method or service exit timed out.  Killing contract 2842 ]
[ Jul  8 14:46:46 Method "start" failed due to signal KILL ]
^C
# ldaplist
ldaplist: LDAP configuration problem (Unable to load configuration '/var/ldap/ldap_client_file' ('').)
#

If I try to init through file, it complains about NS_LDAP_SERVERS, though in all working clients too, there is no NS_LDAP_SERVERS defined

# ldapclient init /var/ldap/ldap_client_file
Invalid server (/var/ldap/ldap_client_file) in NS_LDAP_SERVERS
#