Hi Guys,
Hopefully this is just a quick one - but you never know.
I have/had a Centos Cluster running a Netbackup server - I've had an outage and we seem to have lost a node. As a consequence I'm in a bit of a quandary, not familiar with this software either.
The server is a Dell PowerEdge 1950 running Centos 5.4 with the kernel 2.6.18-164.11.1.el5PAE #1 SMP and wait for it a back ported GFS for compatibility.
I've managed to get the system back and the GFS disk mounted by hacking the /etc/cluster/cluster.conf file as follows - the original file first.
<?xml version="1.0"?>
<cluster alias="scsymbak00" config_version="93" name="scsymbak00">
<fence_daemon clean_start="0" post_fail_delay="1" post_join_delay="3"/>
<clusternodes>
<clusternode name="scsymbak01.xxx.com" nodeid="1" votes="1">
<fence>
<method name="1">
<device name="scsymbak01_drac"/>
</method>
</fence>
</clusternode>
<clusternode name="scsymbak02.xxx.com" nodeid="2" votes="1">
<fence>
<method name="1">
<device name="scsymbak02_drac"/>
</method>
</fence>
</clusternode>
</clusternodes>
<cman expected_votes="1" two_node="1"/>
<fencedevices>
<fencedevice agent="fence_drac" ipaddr="192.168.0.201" login="root" name="scsymbak01_drac" passwd="drut"/>
<fencedevice agent="fence_drac" ipaddr="192.168.0.202" login="root" name="scsymbak02_drac" passwd="drut"/>
</fencedevices>
<rm>
<failoverdomains>
<failoverdomain name="scsymbak_fd" ordered="1" restricted="1">
<failoverdomainnode name="scsymbak01.xxx.com" priority="2"/>
<failoverdomainnode name="scsymbak02.xxx.com" priority="1"/>
</failoverdomain>
</failoverdomains>
<resources>
<ip address="10.143.252.200" monitor_link="1"/>
<script file="/etc/init.d/nbclient" name="nbclient_init"/>
<script file="/etc/init.d/netbackup" name="netbackup_init"/>
<clusterfs device="/dev/mapper/VolGroup10-DATA" force_unmount="1" fsid="41517" fstype="gfs2" mountpoint="/data" name="symbak_GFS"/>
<lvm lv_name="DATA" name="VolGroup10_DATA_CLVM2" vg_name="VolGroup10"/>
<script file="/etc/init.d/xinetd" name="xinetd_init"/>
<script file="/etc/init.d/vxpbx_exchanged" name="vxpbx_init"/>
<ip address="10.143.224.200" monitor_link="1"/>
<ip address="10.143.226.200" monitor_link="1"/>
</resources>
<service autostart="1" domain="scsymbak_fd" exclusive="0" name="netbackup_srv" recovery="restart">
<ip ref="10.143.224.200"/>
<ip ref="10.143.226.200"/>
<ip ref="10.143.252.200"/>
<script ref="vxpbx_init"/>
<script ref="xinetd_init"/>
</service>
</rm>
</cluster>
This was changed to;
<?xml version="1.0"?>
<cluster alias="scsymbak00" config_version="93" name="scsymbak00">
<fence_daemon clean_start="0" post_fail_delay="1" post_join_delay="3"/>
<clusternodes>
<clusternode name="scsymbak02.xxx.com" nodeid="1" votes="1">
<fence>
<method name="1">
<device name="scsymbak02_drac"/>
</method>
</fence>
</clusternode>
</clusternodes>
<cman expected_votes="1" two_node="0"/>
<fencedevices>
<fencedevice agent="fence_drac" ipaddr="192.168.0.202" login="root" name="scsymbak02_drac" passwd="drut"/>
</fencedevices>
<rm>
<failoverdomains>
<failoverdomain name="scsymbak_fd" ordered="1" restricted="1">
<failoverdomainnode name="scsymbak02.xxx.com" priority="1"/>
</failoverdomain>
</failoverdomains>
<resources>
<ip address="10.143.252.200" monitor_link="1"/>
<script file="/etc/init.d/nbclient" name="nbclient_init"/>
<script file="/etc/init.d/netbackup" name="netbackup_init"/>
<clusterfs device="/dev/mapper/VolGroup10-DATA" force_unmount="1" fsid="41517" fstype="gfs2" mountpoint="/data" name="symbak_GFS"/>
<lvm lv_name="DATA" name="VolGroup10_DATA_CLVM2" vg_name="VolGroup10"/>
<script file="/etc/init.d/xinetd" name="xinetd_init"/>
<script file="/etc/init.d/vxpbx_exchanged" name="vxpbx_init"/>
<ip address="10.143.224.200" monitor_link="1"/>
<ip address="10.143.226.200" monitor_link="1"/>
</resources>
<service autostart="1" domain="scsymbak_fd" exclusive="0" name="netbackup_srv" recovery="restart">
<ip ref="10.143.224.200"/>
<ip ref="10.143.226.200"/>
<ip ref="10.143.252.200"/>
<script ref="vxpbx_init"/>
<script ref="xinetd_init"/>
</service>
</rm>
</cluster>
When I run clustat I see the following.
Cluster Status for scsymbak00 @ Mon Nov 17 16:55:28 2014
Member Status: Quorate
Member Name ID Status
------ ---- ---- ------
scsymbak02.xxx.com 1 Online, Local, rgmanager
Service Name Owner (Last) State
------- ---- ----- ------ -----
service:netbackup_srv (none) stopped
Although the disks have come back, the cluster doesn't seem to be up - is there anything else that I should be looking at. The networking hasn't started properly as I'm not seeing the clustered IP's so here is the output of ifconfig.
[root@scsymbak02 cluster]# ifconfig -a
bond0 Link encap:Ethernet HWaddr 00:1E:C9:AB:BB:11
inet addr:10.143.252.202 Bcast:10.143.253.255 Mask:255.255.254.0
inet6 addr: fe80::21e:c9ff:feab:bb11/64 Scope:Link
UP BROADCAST RUNNING MASTER MULTICAST MTU:1500 Metric:1
RX packets:41863 errors:0 dropped:22325 overruns:0 frame:0
TX packets:47278 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:0
RX bytes:3704818 (3.5 MiB) TX bytes:32203049 (30.7 MiB)
bond0:1 Link encap:Ethernet HWaddr 00:1E:C9:AB:BB:11
inet addr:192.168.0.102 Bcast:192.168.0.255 Mask:255.255.255.0
UP BROADCAST RUNNING MASTER MULTICAST MTU:1500 Metric:1
bond1 Link encap:Ethernet HWaddr 00:1B:21:18:29:68
inet addr:10.143.224.202 Bcast:10.143.225.255 Mask:255.255.254.0
inet6 addr: fe80::21b:21ff:fe18:2968/64 Scope:Link
UP BROADCAST RUNNING MASTER MULTICAST MTU:1500 Metric:1
RX packets:9768 errors:0 dropped:0 overruns:0 frame:0
TX packets:14251 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:0
RX bytes:694596 (678.3 KiB) TX bytes:847554 (827.6 KiB)
bond2 Link encap:Ethernet HWaddr 00:1B:21:18:29:69
inet addr:10.143.226.202 Bcast:10.143.227.255 Mask:255.255.254.0
UP BROADCAST MASTER MULTICAST MTU:1500 Metric:1
RX packets:0 errors:0 dropped:0 overruns:0 frame:0
TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:0
RX bytes:0 (0.0 b) TX bytes:0 (0.0 b)
eth0 Link encap:Ethernet HWaddr 00:1E:C9:AB:BB:11
UP BROADCAST RUNNING SLAVE MULTICAST MTU:1500 Metric:1
RX packets:30568 errors:0 dropped:11059 overruns:0 frame:0
TX packets:21803 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:2808757 (2.6 MiB) TX bytes:11663865 (11.1 MiB)
Interrupt:177 Memory:f8000000-f8012800
eth1 Link encap:Ethernet HWaddr 00:1E:C9:AB:BB:13
UP BROADCAST RUNNING SLAVE MULTICAST MTU:1500 Metric:1
RX packets:11295 errors:0 dropped:11266 overruns:0 frame:0
TX packets:25475 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:896061 (875.0 KiB) TX bytes:20539184 (19.5 MiB)
Interrupt:169 Memory:f4000000-f4012800
eth2 Link encap:Ethernet HWaddr 00:1B:21:18:29:68
UP BROADCAST RUNNING SLAVE MULTICAST MTU:1500 Metric:1
RX packets:4758 errors:0 dropped:0 overruns:0 frame:0
TX packets:7181 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:339738 (331.7 KiB) TX bytes:426270 (416.2 KiB)
Memory:fd2e0000-fd300000
eth3 Link encap:Ethernet HWaddr 00:1B:21:18:29:69
UP BROADCAST SLAVE MULTICAST MTU:1500 Metric:1
RX packets:0 errors:0 dropped:0 overruns:0 frame:0
TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:0 (0.0 b) TX bytes:0 (0.0 b)
Memory:fd2a0000-fd2c0000
eth4 Link encap:Ethernet HWaddr 00:1B:21:18:29:6C
UP BROADCAST RUNNING SLAVE MULTICAST MTU:1500 Metric:1
RX packets:5010 errors:0 dropped:0 overruns:0 frame:0
TX packets:7070 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:354858 (346.5 KiB) TX bytes:421284 (411.4 KiB)
Memory:fcce0000-fcd00000
eth5 Link encap:Ethernet HWaddr 00:1B:21:18:29:6D
UP BROADCAST SLAVE MULTICAST MTU:1500 Metric:1
RX packets:0 errors:0 dropped:0 overruns:0 frame:0
TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:0 (0.0 b) TX bytes:0 (0.0 b)
Memory:fcca0000-fccc0000
lo Link encap:Local Loopback
inet addr:127.0.0.1 Mask:255.0.0.0
inet6 addr: ::1/128 Scope:Host
UP LOOPBACK RUNNING MTU:16436 Metric:1
RX packets:7247 errors:0 dropped:0 overruns:0 frame:0
TX packets:7247 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:0
RX bytes:971488 (948.7 KiB) TX bytes:971488 (948.7 KiB)
sit0 Link encap:IPv6-in-IPv4
NOARP MTU:1480 Metric:1
RX packets:0 errors:0 dropped:0 overruns:0 frame:0
TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:0
RX bytes:0 (0.0 b) TX bytes:0 (0.0 b)
So I guess the question is, what should I start with - if I want to boot this cluster as single node - how should I go about it. Are there any other changes that I should make to the cluster.conf file? Or are there any other files that I should be changing as well - any help here would be really appreciated.
Unfortunately I have a dentists appointment, but I'll be back online a little later - but out of the office. However if there are any other files I have to look at or change, I'll be doing that first thing in the morning.
Regards
Dave