SC3.2 issue - cluster transport configuration not right - resulting fail

system · December 19, 2009, 11:22am

I am trying to set up a two host cluster. trouble is with the cluster transport configuration.

i'm using e1000g2 and g3 for the cluster transport. global0 and global1 are my two nodes, and I am running the scinstall from global1.

i think i should be expecting, is this:

The following connections were discovered:
global1:e1000g2  switch1  global0:e1000g2
global1:e1000g3  switch2  global0:e1000g3

but what i am actually getting, is this:

 The following connections were discovered:
global1:e1000g2  switch1  global0:e1000g2
global1:e1000g2  switch1  global0:e1000g3

I think it is because of the failure above that the cluster does not work.

I've attached the scinstall logfile. Is there any other info i need to add?

I've tried all sorts of combinations. I am using link-based IPMP on e1000g0 and g1, on both of the servers.

incredible · December 19, 2009, 10:03pm

Is you route to the switch ok? are you able to ping? if not, fix that first

system · December 20, 2009, 6:06am

i'm not using the seperate vlans for the interconnects therefore they wont connect to the switch. i have changed it to use direct (like crossover) connect and that enables the install to move forward a step.

fugitive · December 22, 2009, 6:13am

Are you trying to configure the cluster on virtual machines .. if yes then i can point you to the answer

system · December 22, 2009, 8:48am

yes i'm trying to install to VM's on virtualbox.

it gets stuck after the reboot of the remote node.

fugitive · December 22, 2009, 8:51am

k .. i got the same error while i was setting up the cluster on vmware server.

Make sure ur guest solaris OS is 64 bit.
for cluster transport setup 2 seperate hostonly intefaces then it works it could different in virtual box as i 'm not very much familiar with VB.

DukeNuke2 · December 22, 2009, 8:57am

thats true. i've just done this in my vbox. use two seperate "internal networks" for the virtual adapters (3 if you use a internal public network). cluster works fine for me with that configuration even with ipmp for the public interface.

system · December 22, 2009, 12:15pm

ipmp: did you use link or probe based? i'm using link based on internal networks. four adapters - e1000g0 to g3.

did you have to configure the network on the vbox host itself?

what instructions did you follow?

DukeNuke2 · December 22, 2009, 12:47pm

i've done probe based ipmp. same adapters for me. e1000g0-3. and i use a "internal only" network for public and clusterinterconnect. so no traffic to the outside world.

what instructions do you mean?

system · December 22, 2009, 1:11pm

I wonder if it's because i am using link based?

Can you show me your /etc/hostname.e1000g0 and g1 files please (assuming they are your two network ips)?

In the cluster set up; do you say you use switches or (like) crossover for e1000g2 and g3 (assuming they are your cluster transport private network adapters)?

wondered if you followed any specific set up e.g. online guide or whatever. I havent been on a cluster course, i've had to read all about it using google etc. didnt happen to write any notes on it did you?

I have two nodes; global0 and global1. I run the scinstall from global1. I dont understand why, after global0 reboots it just sits there saying "waiting for global0 to reboot..."

I do hope you dont mind me picking your brains and maybe asking newbie questions about clustering but i'm going to go mad if i cant get two nodes to cluster.

fugitive · December 23, 2009, 1:48am

Solaris Cluster on a laptop using VirtualBox, iSCSI and a quorum server : JET Stream should help you .. but its straight fwd once you understand and figure out which interface to select for cluster interconnects.

system · December 23, 2009, 2:41am

yeah i've been reading that one, amongst other.

i think though it might be because i am using link based ipmp not probe, as all the other guides say they use probe.

i could still use:

/etc/hosts
/etc/hostname.e1000g*
anyother network files that might be relivant.

did you change the address of the virtualbox host itself? is it necessary to have it on the same subnet as the two virtual servers?

Saying all that, there is nothing wrong with the comms during the initial part of the scinstall. It would appear that the VM that gets rebooted first, simply doesnt contact the scinstall server when it comes back up.

I am wondering, perhaps i dont have the iscsi set up correctly? although i can do a format on it from the remote server. does the remote server (the first to reboot) rely on there being a quorum available?

damn i am stuck.

DukeNuke2 · December 23, 2009, 5:35am

yes, you need a quorum!

my setup is:

one admin node (also solaris)
two cluster nodes

on the admin node is clustersoftware installed (cluster console and quorum server).

system · December 23, 2009, 5:42am

i'm just setting up an admin node now.

did you make the admin node zfs? or ufs and then create a zpool from files? what did you do?

DukeNuke2 · December 23, 2009, 6:32am

i just used a normal ufs filesystem for the admin node... in fact i just cloned a cluster node... i just installed one cluster node and used the vbox features to export the virtual disk into a new system. after booting the new system i've done a "sys-unconfig" to give a new name and ip to the "new" node...

system · December 23, 2009, 6:19pm

well, my iscsihost server is just booting up for the first time. so i'll see.

---------- Post updated at 11:19 PM ---------- Previous update was at 11:47 AM ----------

i'm still stuck.

I'm trying to set up probe based ipmp. In the /etc/hosts, do i have to put all the other ips in there to do with IPMP? is it necessary to create the fake MAC addresses?

dukenuke2> you used internel network for your VMs. that means those vnics didnt have to be set up? what does your /etc/hosts and /etc/hostname.e1000g* look like please?

DukeNuke2 · December 24, 2009, 8:09am

this is how my config (for node1) looks...

the interfaces:

root@clnode1 # dladm show-dev
e1000g0         link: up        speed: 1000  Mbps       duplex: full
e1000g1         link: up        speed: 1000  Mbps       duplex: full
e1000g2         link: up        speed: 1000  Mbps       duplex: full
e1000g3         link: up        speed: 1000  Mbps       duplex: full
clprivnet0              link: unknown   speed: 0     Mbps       duplex: unknown

the hostfile:

root@clnode1 # cat /etc/hosts
#
# Internet host table
#
::1     localhost       
127.0.0.1       localhost       

# Node1
192.168.10.100  node1-interface0
192.168.10.101  node1-interface1
192.168.10.102  clnode1 clnode1.mycluster.de loghost

# Node2
192.168.10.200  node2-interface0
192.168.10.201  node2-interface1
192.168.10.202  clnode2 clnode2.mycluster.de

# Cluster
192.168.10.50   lh-cluster lh-cluster.mycluster.de

the files for the interfaces:

root@clnode1 # more /etc/hostname.e1000g*
::::::::::::::
/etc/hostname.e1000g0
::::::::::::::
node1-interface0 netmask + broadcast + group ipmp0 deprecated -failover up \
addif clnode1 netmask + broadcast + failover up
::::::::::::::
/etc/hostname.e1000g3
::::::::::::::
node1-interface1 netmask + broadcast + group ipmp0 deprecated -failover up

and the ifconfig output:

root@clnode1 # ifconfig -a
lo0: flags=2001000849<UP,LOOPBACK,RUNNING,MULTICAST,IPv4,VIRTUAL> mtu 8232 index 1
        inet 127.0.0.1 netmask ff000000 
e1000g0: flags=9040843<UP,BROADCAST,RUNNING,MULTICAST,DEPRECATED,IPv4,NOFAILOVER> mtu 1500 index 2
        inet 192.168.10.100 netmask ffffff00 broadcast 192.168.10.255
        groupname ipmp0
        ether 8:0:27:b4:ee:8e 
e1000g0:1: flags=1000843<UP,BROADCAST,RUNNING,MULTICAST,IPv4> mtu 1500 index 2
        inet 192.168.10.102 netmask ffffff00 broadcast 192.168.10.255
e1000g1: flags=1008843<UP,BROADCAST,RUNNING,MULTICAST,PRIVATE,IPv4> mtu 1500 index 5
        inet 172.16.0.130 netmask ffffff80 broadcast 172.16.0.255
        ether 8:0:27:bc:39:9f 
e1000g2: flags=1008843<UP,BROADCAST,RUNNING,MULTICAST,PRIVATE,IPv4> mtu 1500 index 4
        inet 172.16.1.2 netmask ffffff80 broadcast 172.16.1.127
        ether 8:0:27:4a:22:18 
e1000g3: flags=9040843<UP,BROADCAST,RUNNING,MULTICAST,DEPRECATED,IPv4,NOFAILOVER> mtu 1500 index 3
        inet 192.168.10.101 netmask ffffff00 broadcast 192.168.10.255
        groupname ipmp0
        ether 8:0:27:45:f2:f4 
clprivnet0: flags=1009843<UP,BROADCAST,RUNNING,MULTICAST,MULTI_BCAST,PRIVATE,IPv4> mtu 1500 index 6
        inet 172.16.4.2 netmask fffffe00 broadcast 172.16.5.255
        ether 0:0:0:0:0:2

hth,
DN2

system · January 6, 2010, 7:06am

just realised what you said here. i'm using 32bit OS. what difference is that going to make?

EDIT: also is it possible to make the zones active-active instead of just active-passive?

fugitive · January 11, 2010, 1:07am

As far as i know sun cluster 3.2 does not work on 32 bit Solaris10 u must have 64 bit OS

system · January 11, 2010, 3:22am

that'll explain why i couldnt get my cluster to work on my laptop.