Solaris 10 svcs failures

I have a solaris 10 machine that was working fine until the system crashed after a power failure. Now, after the system boots up, several services go into maintenance mode.

offline         8:54:59 svc:/milestone/multi-user-server:default
offline         8:54:59 svc:/application/graphical-login/cde-login:default
offline         8:54:59 svc:/system/zones:default
offline         8:55:00 svc:/application/cde-printinfo:default
offline         8:55:07 svc:/system/iscsitgt:default
offline         8:55:07 svc:/system/basicreg:default
maintenance    13:00:22 svc:/application/print/ipp-listener:default
maintenance    13:02:03 svc:/network/ldap/client:default
maintenance    13:02:59 svc:/application/management/dmi:default
maintenance    13:30:12 svc:/milestone/multi-user:default

I have tried to clear and restart the services, but nothing works. I can't figure out the problem from the logs and the system does not go into multi user mode.
How can I get these services our of maintenance mode?

THanks.

post the output of svcs -xv for starters...

Sorry. Here is the output of svcs -xv

# svcs -xv
svc:/milestone/multi-user:default (multi-user milestone)
 State: maintenance since Tue Apr 30 13:30:12 2013
Reason: Start method died on Killed (9).
   See: http://sun.com/msg/SMF-8000-KS
   See: man -M /usr/share/man -s 1M init
   See: /var/svc/log/milestone-multi-user:default.log
Impact: 6 dependent services are not running:
        svc:/milestone/multi-user-server:default
        svc:/system/basicreg:default
        svc:/system/zones:default
        svc:/application/graphical-login/cde-login:default
        svc:/system/iscsitgt:default
        svc:/application/cde-printinfo:default
svc:/application/management/dmi:default (Sun Solstice Enterprise DMI)
 State: maintenance since Tue Apr 30 13:02:59 2013
Reason: Start method failed repeatedly, last died on Killed (9).
   See: http://sun.com/msg/SMF-8000-KS
   See: man -M /usr/share/man/ -s 1M dmispd
   See: /var/svc/log/application-management-dmi:default.log
Impact: This service is not running.
svc:/application/print/ipp-listener:default (Internet Print Protocol Listening Service)
 State: maintenance since Tue Apr 30 13:00:22 2013
Reason: Start method failed repeatedly, last died on Killed (9).
   See: http://sun.com/msg/SMF-8000-KS
   See: man -M /usr/share/man -s 4 mod_ipp
   See: /var/svc/log/application-print-ipp-listener:default.log
Impact: This service is not running.
svc:/network/ldap/client:default (LDAP client)
 State: maintenance since Tue Apr 30 13:02:03 2013
Reason: Start method failed repeatedly, last died on Killed (9).
   See: http://sun.com/msg/SMF-8000-KS
   See: man -M /usr/share/man -s 1M ldap_cachemgr
   See: /var/svc/log/network-ldap-client:default.log
Impact: This service is not running.

after a power failure a filesystem error is often a problem. is there any more information in the messages file?
also check your network settings like hostname in /etc/hosts and so on.

you can try to work through the provied link:
https://support.oracle.com/epmos/faces/DocumentDisplay?alias=EVENT%3ASMF-8000-KS&\_afrLoop=323545771682977&\_afrWindowMode=0&\_adf.ctrl-state=hh6wxk7bq_4

1 Like

Thank you. All the network settings are fine. I can connect to other machines and traceroute and ping. I don't see any messages in the messages file pertaining to this either.

I tried going to the link that was provided, but that requires a contract with oracle, which I don't have. Is there anything else I can do to troubleshoot?

check this:
http://docs.oracle.com/cd/E19253-01/817-1985/ecdps/index.html

Any hintful error messages in the service log files? E.g.

cat /var/svc/log/milestone-multi-user:default.log

I tried to clear and restart the multi-user service. However,after about 3 minutes, it went back to maintenance mode. This is what I saw in the log:

[ Apr 30 13:00:09 Leaving maintenance because clear requested. ]
[ Apr 30 13:00:09 Enabled. ]
[ Apr 30 13:00:09 Executing start method ("/sbin/rc2 start") ]
lsvcrun: Service matching "/etc/rc2.d/S10lu" seems to be running.
Executing legacy init script "/etc/rc2.d/S10lu" despite previous errors.
Legacy init script "/etc/rc2.d/S10lu" exited with return code 0.
lsvcrun: Service matching "/etc/rc2.d/S20sysetup" seems to be running.
Executing legacy init script "/etc/rc2.d/S20sysetup" despite previous errors.
Legacy init script "/etc/rc2.d/S20sysetup" exited with return code 0.
lsvcrun: Service matching "/etc/rc2.d/S40llc2" seems to be running.
Executing legacy init script "/etc/rc2.d/S40llc2" despite previous errors.
Legacy init script "/etc/rc2.d/S40llc2" exited with return code 0.
lsvcrun: Service matching "/etc/rc2.d/S42ncakmod" seems to be running.
Executing legacy init script "/etc/rc2.d/S42ncakmod" despite previous errors.
Legacy init script "/etc/rc2.d/S42ncakmod" exited with return code 0.
lsvcrun: Service matching "/etc/rc2.d/S47pppd" seems to be running.
Executing legacy init script "/etc/rc2.d/S47pppd" despite previous errors.
Legacy init script "/etc/rc2.d/S47pppd" exited with return code 0.
lsvcrun: Service matching "/etc/rc2.d/S70uucp" seems to be running.
Executing legacy init script "/etc/rc2.d/S70uucp" despite previous errors.
Legacy init script "/etc/rc2.d/S70uucp" exited with return code 0.
lsvcrun: Service matching "/etc/rc2.d/S72autoinstall" seems to be running.
Executing legacy init script "/etc/rc2.d/S72autoinstall" despite previous errors.
Legacy init script "/etc/rc2.d/S72autoinstall" exited with return code 0.
lsvcrun: Service "/etc/rc2.d/S72directory" has an invalid property group.
Executing legacy init script "/etc/rc2.d/S72directory" despite previous errors.
/usr/iplanet/ds5/slapd-Turquoise/start-slapd
[ Apr 30 13:30:12 Method or service exit timed out.  Killing contract 111 ]

This is what is in the log file for svc:/network/ldap/client:default

[ Apr 30 13:00:03 Leaving maintenance because clear requested. ]
[ Apr 30 13:00:03 Enabled. ]
[ Apr 30 13:00:03 Executing start method ("/usr/lib/ldap/ldap_cachemgr") ]
[ Apr 30 13:02:03 Method or service exit timed out.  Killing contract 110 ]
[ Apr 30 13:02:03 Method "start" failed due to signal KILL ]

I am still at a loss (:

The problem should be here:

lsvcrun: Service "/etc/rc2.d/S72directory" has an invalid property group.
Executing legacy init script "/etc/rc2.d/S72directory" despite previous errors.
/usr/iplanet/ds5/slapd-Turquoise/start-slapd
[ Apr 30 13:30:12 Method or service exit timed out.  Killing contract 111 ]

Investigate this service...

1 Like

Is your machine its own ldap client ? ( cat /var/ldap/ldap_client_file )

1 Like

Yes, my machine is its own ldap client. Is that a problem?

I can think of a deadlock when it needs to query ldap in order to start ldap service.
Have all needed information in /etc files, and files before ldap in nsswitch.conf!

1 Like

Thank you so much to NukeDuke2 and MadeinGermany. Your suggestions helped me fix the problem.

So, here are the steps that I took:
� I went in and renamed /etc/rc2.d/S72directory to /etc/rc2.d/s72directory so that it did not get started in multi-user mode.
� Cleared and restarted the multi-user service.
� This got the multi-user service and all its dependent services that were offline before, back online. Did the same for dmi service and the ipp-listener service.
� Now, only the ldap/client service was in maintenance.
� Then looked at /etc/nsswitch.conf file and noticed that there were several entries that had ldap before files. Renamed /etc/nsswitch.conf to /etc/nsswitch.conf.old and changed all the entries so that it looked at the local files before it looked at ldap.
� I started the directory service and it started.
� Cleared and restarted the ldap/client service

Thanks again. I really appreciate your help.

That's a big problem. This configuration is not supported and leads to the issue you are experiencing.

From Chapter*8 Introduction to LDAP Naming Services (Overview/Reference) (System Administration Guide: Naming and Directory Services (DNS, NIS, and LDAP))
A directory server (an LDAP server) cannot be its own client. That is, you cannot configure the machine that is running the directory server software to become an LDAP naming services client.
The reason why is Solaris starts the client services first, then the server ones. Assuming you have more than one directory server master, you should at least point your ldap client configuration to the remote machine (and reciprocally).

That would allow a clean reboot but won't be enough though if both LDAP servers are rebooted at the same time.

It is not guarantee your current configuration, i.e. with ldap after files in nsswitch.conf, won't hang again at next reboot.

1 Like

Thank you, Jlliagre. I will read up on this re-do the ldap configuration.
Would you know how a ldap server would then work? In order to log into the server, can they log in without having a ldap client?

You can log in as long as you have a local account (/etc/passwd), a NIS account (note that a NIS server can be its own NIS client) or an LDAP account served by remote highly available servers.