Solaris: Fault Management Service toggles online, offline

I have two Solaris 10 T2000 systems.

Platform sun8 has newer firmware than sun7.

sun8/user$ prtdiag -v | grep OBP
OBP 4.30.4.b 2010/07/09 13:48

sun7/user$ prtdiag -v | grep OBP
OBP 4.30.4.a 2010/01/06 14:56

The platform (sun8) with the newer firmware (OBP 4.30.4.b) has a Fault Management service which toggles online/offline repetitively.

sun8/user$ svcs fmd
STATE          STIME    FMRI
online         14:40:52 svc:/system/fmd:default

sun8/user$ svcs fmd
STATE          STIME    FMRI
offline*       14:40:55 svc:/system/fmd:default

sun8/user$ svcs fmd
STATE          STIME    FMRI
online         14:41:01 svc:/system/fmd:default

sun8/user$ svcs fmd
STATE          STIME    FMRI
offline*       14:41:04 svc:/system/fmd:default

The services which "fmd" are dependent upon are online.

sun8/user$ svcs -d fmd 
STATE          STIME    FMRI
online         Feb_02   svc:/system/filesystem/minimal:default
online         Feb_02   svc:/system/sysevent:default
online         Feb_02   svc:/network/rpc/bind:default
online         Feb_02   svc:/system/dumpadm:default

The error log is not useful.

sun8/user$ svcs -xv fmd
svc:/system/fmd:default (Solaris Fault Manager)
 State: offline since Thu Feb 03 14:45:27 2011
Reason: Start method is running.
   See: http://sun.com/msg/SMF-8000-C4
   See: man -M /usr/share/man -s 1M fmd
   See: /var/svc/log/system-fmd:default.log
Impact: This service is not running.

sun8/user$ tail /var/svc/log/system-fmd:default.log
[ Feb  3 14:45:09 Executing start method ("/usr/lib/fm/fmd/fmd") ]
[ Feb  3 14:45:15 Method "start" exited with status 0 ]
[ Feb  3 14:45:18 Stopping because all processes in service exited. ]
[ Feb  3 14:45:18 Executing stop method (:kill) ]
[ Feb  3 14:45:18 Executing start method ("/usr/lib/fm/fmd/fmd") ]
[ Feb  3 14:45:24 Method "start" exited with status 0 ]
[ Feb  3 14:45:27 Stopping because all processes in service exited. ]
[ Feb  3 14:45:27 Executing stop method (:kill) ]
[ Feb  3 14:45:27 Executing start method ("/usr/lib/fm/fmd/fmd") ]
[ Feb  3 14:45:33 Method "start" exited with status 0 ]

Steps already taken:

  • upgraded firmware (the box had the same problem with the older firmware)
  • disabled and enabled the service
  • rebooted the box

Google has a lot of people reporting this problem, even the identical problem on OpenSolaris site which indicated it could not be replicated.

What should be looked at, next, to narrow the issue?

What is the contents, if any, of

/etc/fm/fmd

The contents of "/etc/fm/fmd" - nothing, consistently, even though "fmd" is constantly restarting.

sun8/user$ svcs fmd
STATE          STIME    FMRI
online         19:53:51 svc:/system/fmd:default

sun8/user$ svcs fmd
STATE          STIME    FMRI
offline*       19:53:54 svc:/system/fmd:default

sun8/user$ ls -al /etc/fm/fmd
total 4
drwxr-xr-x   2 root     sys          512 Apr 13  2006 .
drwxr-xr-x   3 root     sys          512 Apr 13  2006 ..