server is getting shutdown

Hi Guys,

Please help in this...when we start HACMP services ..server is getting shutdon.

Error mesg from cluster.log.

Apr 14 08:43:27 bascop17 snmpd[696432]: NOTICE: SMUX trap: (0 0) (127.0.0.1+46302+1)
Apr 14 08:43:33 bascop17 topsvcs[798914]: (Recorded using libct_ffdc.a cv 2):::Error ID: 6UpNEL0JU564/AfH1F2/e.1...................:
::Reference ID: :::Template ID: 97419d60:::Details File: :::Location: rsct,bootstrp.C,1.187,4148 :::TS_START_ST
Topology Services daemon started Topology Services daemon started by: SRC Topology Services daemon log file location /var/ha/log/to
psvcs.14.084333.IXOScluster.en_/var/ha/run/topsvcs.IXOScluster/ Topology Services daemon run directory /var/ha/run/topsvcs.IXOSclust
er/
Apr 14 08:43:36 bascop17 grpsvcs[540700]: (Recorded using libct_ffdc.a cv 2):::Error ID: 60.oOd0MU564/ndm.F2/e.1...................:
::Reference ID: :::Template ID: a96b4002:::Details File: :::Location: RSCT,TraceStream.C,1.79,678 :::GS_MESSAGE_
ST Group Services informational message DIAGNOSTIC EXPLANATION ERROR writing to log file /var/ha/log/grpsvcs_2_35.IXOScluster (rdsta
te=6 errno=0[Error 0] lost=1). Check filesystem.
Apr 14 08:43:36 bascop17 grpsvcs[540700]: (Recorded using libct_ffdc.a cv 2):::Error ID: 60.oOd0MU564//do.F2/e.1...................:
::Reference ID: :::Template ID: a96b4002:::Details File: :::Location: RSCT,TraceStream.C,1.79,693 :::GS_MESSAGE_
ST Group Services informational message DIAGNOSTIC EXPLANATION Lost 1 lines to log file /var/ha/log/grpsvcs_2_35.IXOScluster. Writi
ng to log now.
Apr 14 08:43:36 bascop17 grpsvcs[540700]: (Recorded using libct_ffdc.a cv 2):::Error ID: 63Y7ej0MU564/4g//F2/e.1...................:
::Reference ID: :::Template ID: afa89905:::Details File: :::Location: RSCT,pgsd.C,1.58,566 :::GS_START_ST
Group Services daemon started DIAGNOSTIC EXPLANATION HAGS daemon started by SRC. Log file is /var/ha/log/grpsvcs_2_35.IXOScluster.
Apr 14 08:43:40 bascop17 grpsvcs[540700]: (Recorded using libct_ffdc.a cv 2):::Error ID: 60.oOd0QU564/ZOT.F2/e.1...................:
::Reference ID: :::Template ID: a96b4002:::Details File: :::Location: RSCT,TraceStream.C,1.79,678 :::GS_MESSAGE_
ST Group Services informational message DIAGNOSTIC EXPLANATION ERROR writing to log file /var/ha/log/grpsvcs_2_35.IXOScluster.long (
rdstate=6 errno=3[The process does not exist.] lost=1). Check filesystem.
Apr 14 08:43:40 bascop17 grpsvcs[540700]: (Recorded using libct_ffdc.a cv 2):::Error ID: 60.oOd0QU564/gNV.F2/e.1...................:
::Reference ID: :::Template ID: a96b4002:::Details File: :::Location: RSCT,TraceStream.C,1.79,693 :::GS_MESSAGE_
ST Group Services informational message DIAGNOSTIC EXPLANATION Lost 1 lines to log file /var/ha/log/grpsvcs_2_35.IXOScluster.long.
Writing to log now.
Apr 14 08:43:46 bascop17 clstrmgrES[606366]: Sat Apr 14 08:43:46 HACMP/ES Cluster Manager Started
Apr 14 08:43:55 bascop17 snmpd[696432]: NOTICE: SMUX trap: (6 10) (127.0.0.1+46302+1)
Apr 14 08:44:15 bascop17 snmpd[696432]: NOTICE: SMUX trap: (6 13) (127.0.0.1+46302+1)
Apr 14 08:44:15 bascop17 snmpd[696432]: NOTICE: SMUX trap: (6 14) (127.0.0.1+46302+1)
Apr 14 08:44:15 bascop17 snmpd[696432]: NOTICE: SMUX trap: (6 13) (127.0.0.1+46302+1)
Apr 14 08:44:15 bascop17 snmpd[696432]: NOTICE: SMUX trap: (6 14) (127.0.0.1+46302+1)
Apr 14 08:44:18 bascop17 snmpd[696432]: NOTICE: SMUX trap: (6 20) (127.0.0.1+46302+1)
Apr 14 08:44:18 bascop17 snmpd[696432]: NOTICE: SMUX trap: (6 20) (127.0.0.1+46302+1)
Apr 14 08:44:20 bascop17 snmpd[696432]: NOTICE: SMUX trap: (6 11) (127.0.0.1+46302+1)
Apr 14 08:44:20 bascop17 snmpd[696432]: NOTICE: SMUX trap: (6 77) (127.0.0.1+46302+1)
Apr 14 08:44:20 bascop17 clinfoES[557092]: send_snmp_req: Messages in queue got = 5 read = 1
Apr 14 08:44:28 bascop17 last message repeated 5 times
Apr 14 08:44:47 bascop17 haemd[639082]: LPP=PSSP,Fn=emd_gsi.c,SID=1.4.1.36,L#=1361, haemd: 2521-
032 Cannot dispatch group services (1).
Apr 14 08:44:47 bascop17 clstrmgrES[606366]: Sat Apr 14 08:44:47 announcementCb: Called, state=ST_JOINING
Apr 14 08:44:47 bascop17 clstrmgrES[606366]: Sat Apr 14 08:44:47 announcementCb: GRPSVCS announcment code=512; exiting
Apr 14 08:44:47 bascop17 clstrmgrES[606366]: Sat Apr 14 08:44:47 CHECK FOR FAILURE OF RSCT SUBSYSTEMS (topsvcs or grpsvcs)
Apr 14 08:44:47 bascop17 clstrmgrES[606366]: Sat Apr 14 08:44:47 clstrmgr on node 2 is exiting with code 4
Apr 14 08:44:48 bascop17 snmpd[696432]: NOTICE: SMUX packet from (127.0.0.1+46302+1)
Apr 14 08:44:48 bascop17 snmpd[696432]: NOTICE: SMUX trap: (6 10) (127.0.0.1+46302+1)
Apr 14 08:44:48 bascop17 snmpd[696432]: NOTICE: SMUX packet from (127.0.0.1+46302+1)
Apr 14 08:44:48 bascop17 snmpd[696432]: NOTICE: SMUX trap: (6 11) (127.0.0.1+46302+1)
Apr 14 08:44:48 bascop17 HACMP for AIX: clexit.rc : Unexpected termination of clstrmgrES
Apr 14 08:44:48 bascop17 HACMP for AIX: clexit.rc : Halting system immediately!!!
Apr 14 08:56:55 bascop17 prngd[319646]: prngd 0.9.27 (20 Dec 2002) started up for user root
Apr 14 08:56:55 bascop17 prngd[319646]: have 7 out of 2000 filedescriptors open
Apr 14 08:56:56 bascop17 RMCdaemon[311452]: (Recorded using libct_ffdc.a cv 2):::Error ID: 6eKora0sg564/11L/F2/e.1..................
.:::Reference ID: :::Template ID: a6df45aa:::Details File: :::Location: RSCT,rmcd.c,1.48,209 :::RMCD_INFO
_0_ST The daemon is started.

I encountered the same problem on AIX 5.2 with HACMP 5.2.

In my case, I detect an error in /etc/hosts file on one of the nodes. Actually I found the HACMP Private IPs written incorrect.

If you checked every thing & didn't find any error, I suggest to stop the cluster on all nodes then verify & syncronize the Cluster with Error Correcting using SMIT.

To verify and synchronize the cluster topology and resources configuration.

  1. On the command prompt enter smit hacmp
  2. In SMIT, select Extended Configuration > Extended Verification and Synchronization.
  3. Set the option that said Automatically correct errors found during verification? to what is proper for you.

I wish this help.