Solaris 10 - Cluster Problem

Hi,

I am trying to install Solaris cluster with Solaris10 and Cluster suite 3.2 2/08. After going through customer configuration in "scinstall" utility, I am facing a problem.

At the time of selecting 2nd node's file system type, I am giving input as /globaldevices which is very much part of my 2nd node's partitions, I am getting error as,

What is the name of the file system? /globaldevices

Testing for "/globaldevices" on "solariscluster1" ... failed

scrcmd: RPC: Program not registered
Unable to check "/globaldevices" on "solariscluster1" due to failed remote command.

Please if anybody knows, let me know how to go ahead from here onwards?

Thanks,
Hitarth

/globaldevices (or the underlying device) has to be cluster unique. So each
cluster node has to use a different meta device.
Check that the hostnames are in both the /etc/hosts /etc/inet/ipnodes files in both nodes. Be sure you don't have some unprintable character in vfstab .
Try to get the install log from the system and post its messages

Hi,

Sorry for replying late as I was out of station for some days. Anyways, I am back and tried again with all suggestions but still I am facing problems. Let me tell you how I tried step by step.

(1) Installed Solaris10 on both node having "/globaldevices" as separate partition. (I haven't registered OS to SUN for some reason).
(2) Installed Solaris Cluster Suite 3.2 2/08 on both nodes.
(3) Make entries of both node in files like /etc/hosts,/etc/inet/ipnodes. (Please see attached file where I have given respective configurations what I did before moving ahead with "scinstall").
(4) ssh is enabled between two nodes and root user login is also available when accessing from root user.
(5) Both nodes have cross-cable connectivity for Private network and that physical ethernet card is not plumbed at startup as well!!!!
(6) After going ahead with "scinstall" custom installation from Node-1, I got stucked and not getting how to move ahead. (Attaching file for scinstall log from Node-1).

I think I had the same problem once setting up a sun cluster, the problem was caused by RPC which rejected connections not from localhost.
First try if you see the requested service from localhost (rpcinfo -p) then try to access the service from other host (seconde cluster node), use the rpcinfo command with options (-p host; -t ) if you can connect from localhost, but can't from remote your're home!

GOT IT Ha Ha:
these services have to be reconfigured to enable rpc queries from hosts other then localhost:

metad
metamedd
metamhd
scadmd
scrcmd
metacld

the command to reconfigure them:
# svccfg -s rpc/bind listprop config/local_only
config/local_only boolean true

# svccfg -s rpc/bind setprop config/local_only=false

# svccfg -s rpc/bind listprop config/local_only
config/local_only boolean false

# svcadm refresh svc:/network/rpc/bind

Try googling for rpc remote/ rpc allow remote/ rpc localhost... you'll find NOTHING.
Knowing right peaple is important these days ....:slight_smile:

It was very nice to refresh cluster experience a bit. Thank you, the pleaseure was all mine:)

AFAIK, sun's docummentation tells nothing about this as well ...

Togr,

First, You are AWSOME and yes we have to catch RIGHT person these days to come out of any task if you are NEWBIE like me. I am catching you now!!!! :smiley:

I have followed your suggested steps and tried again to build a two node cluster but nodes are not coming in cluster mode after rebooting by scinstall.:mad:

########## TWO NODE Cluster attempt ################
bash-3.00# clnode show
clnode: (C152734) This node is not in cluster mode.
#############################################

This messages is coming on both servers. I tried to create SINGLE node cluster on one node as well but after rebooting it, same above mentioned message!!!!:mad:

############## One Node Cluster Log ####################

bash-3.00# cat scinstall.log.5275

*** Establish Just the First Node of a New Cluster ***
Fri Oct 24 16:00:15 IST 2008

scinstall -ik -C hitarth -F -o -P task=quorum,state=INIT

Checking device to use for global devices file system ... done

Initializing cluster name to "hitarth" ... done
Initializing authentication options ... done

Setting the node ID for "solariscluster1" ... done (id=1)

Checking for global devices global file system ... done
Updating vfstab ... done

Updating nsswitch.conf ... done

Adding cluster node entries to /etc/inet/hosts ... done

Configuring IP multipathing groups ...done

Verifying that power management is NOT configured ... done
Unconfiguring power management ... done
/etc/power.conf has been renamed to /etc/power.conf.102408160153
Power management is incompatible with the HA goals of the cluster.
Please do not attempt to re-configure power management.

Ensure network routing is disabled ... done
Network routing has been disabled on this node by creating /etc/notrouter.
Having a cluster node act as a router is not supported by Sun Cluster.
Please do not re-enable network routing.

Rebooting ...
###################################################

P.S.- (1) Both nodes are connected with CROSSOVER cable for private network. (2) I haven't DISABLE auto quorum option while creating this cluster.

Please take out of this!!!!! I am getting frustrated now!!!!

Thanks

I was away for a while ... enjoying nice autumn weather here...

Well, you got to the reboot stage. Good.
Strange that the cluster does not come back after the reboot...

I think you no longer receive the first error regarding communication problems do you? Can you see any error message now? The log provided looks fairly OK, what happens after reboot? Suncluster is rather sensitive and talkative, when it boots.

If you can't see exact error message I will try to provide you with some basic guidance.
You sound like someone familiar with Solaris, but you said you're newbie, maybe you're doing some simple error like I did several times setting up my sun clusters :slight_smile:

1/ Are you sure the quorum device is picked up?
Does cldevice list -v / cldevice refresh / clquorum list commands return anything other then error?

2/ Chances are your cluster is still in "installmode", scinstall gives a very clear statement after completing all initial config and leaving "installmode".

3/ Are your /etc/hosts, nsswitch.conf, resolv.conf, /etc/domainname files identical? Make them identical dot after dot on all nodes, once I solved a problem by re-ordering lines although I wasn't able to reproduce it and it was in early days of myself playing with Suncluster.

4/ Are you sure that the underlaying storage is connected correctly?
Perhaps have a chat about basic cluster concepts (shared storage) with someone experienced?
Verify that both nodes can see the same disks, (play with cfgadm -al, luxadm display, cfgadm -al -o show_FCP_dev, format, probe-scsi-all commands on both nodes to ensure they see the same storage).

5/ Did you labeled the disks?

6/ Did you sliced and then mounted filesystems identically ?
Disks have to be sliced identically but SVM names (/dev/md/dsk/d??) have to be uniqe within cluster!

7/ last resort: Do you work for a Sun Service Partner or have a close relationship to any?
I am thinking of EIS (Enterprise Installation Standards) DVD - this would greatly help you setting this up.

8/ (maybe this should go first): are you suing fairly new Solaris10 update?
Forget about all the early first, second releases, have somethig fresh and patched, Oooh I mentioned patches.. large topic, install lots of patchesm recommended, security, and finally SunCluster's patches (they're not for free).

When I install cluster I just install it alone in the first place, and after that I set up application-related things quorum, IPMP, IP, HAStoragePlus, various agents etc...).

I'll give you one of my install-logs below:

[root@node1:/]# scinstall 

  *** Main Menu ***

    Please select from one of the following (*) options:

      * 1) Create a new cluster or add a cluster node
        2) Configure a cluster to be JumpStarted from this install server
        3) Manage a dual-partition upgrade
        4) Upgrade this cluster node
        5) Print release information for this cluster node

      * ?) Help with menu options
      * q) Quit

    Option:  
    Option:  1


  *** New Cluster and Cluster Node Menu ***

    Please select from any one of the following options:

        1) Create a new cluster
        2) Create just the first node of a new cluster on this machine
        3) Add this machine as a node in an existing cluster

        ?) Help with menu options
        q) Return to the Main Menu

    Option:  1


  *** Create a New Cluster ***


    This option creates and configures a new cluster.

    You must use the Java Enterprise System (JES) installer to install 
    the Sun Cluster framework software on each machine in the new cluster 
    before you select this option.

    If the "remote configuration" option is unselected from the JES 
    installer when you install the Sun Cluster framework on any of the 
    new nodes, then you must configure either the remote shell (see 
    rsh(1)) or the secure shell (see ssh(1)) before you select this 
    option. If rsh or ssh is used, you must enable root access to all of 
    the new member nodes from this node.

    Press Control-d at any time to return to the Main Menu.


    Do you want to continue (yes/no) [yes]?  


  >>> Typical or Custom Mode <<<

    This tool supports two modes of operation, Typical mode and Custom. 
    For most clusters, you can use Typical mode. However, you might need 
    to select the Custom mode option if not all of the Typical defaults 
    can be applied to your cluster.

    For more information about the differences between Typical and Custom 
    modes, select the Help option from the menu.

    Please select from one of the following options:

        1) Typical
        2) Custom

        ?) Help
        q) Return to the Main Menu

    Option [1]:  1


  >>> Cluster Name <<<

    Each cluster has a name assigned to it. The name can be made up of 
    any characters other than whitespace. Each cluster name should be 
    unique within the namespace of your enterprise.

    What is the name of the cluster you want to establish [frontend]?  


  >>> Cluster Nodes <<<

    This Sun Cluster release supports a total of up to 16 nodes.

    Please list the names of the other nodes planned for the initial 
    cluster configuration. List one node name per line. When finished, 
    type Control-D:

    Node name:  node1
    Node name:  node2
    Node name (Control-D to finish):  ^D


    This is the complete list of nodes:

        node1
        node2

    Is it correct (yes/no) [yes]?  


    Attempting to contact "node2" ... done

    Searching for a remote configuration method ... done

    The secure shell (see ssh(1)) will be used for remote execution.

    
Press Enter to continue:  


  >>> Cluster Transport Adapters and Cables <<<

    You must identify the cluster transport adapters which attach this 
    node to the private cluster interconnect.

    Select the first cluster transport adapter for "node1":

        1) bge1
        2) bge2
        3) bge3
        4) Other

    Option:  2

    Will this be a dedicated cluster transport adapter (yes/no) [yes]?  

    Searching for any unexpected network traffic on "bge2" ... done
    Verification completed. No traffic was detected over a 10 second 
    sample period.

    Select the second cluster transport adapter for "node1":

        1) bge1
        2) bge2
        3) bge3
        4) Other

    Option:  3

    Will this be a dedicated cluster transport adapter (yes/no) [yes]?  

    Searching for any unexpected network traffic on "bge3" ... done
    Verification completed. No traffic was detected over a 10 second 
    sample period.



  >>> Quorum Configuration <<<

    Every two-node cluster requires at least one quorum device. By 
    default, scinstall will select and configure a shared SCSI quorum 
    disk device for you.

    This screen allows you to disable the automatic selection and 
    configuration of a quorum device.

    The only time that you must disable this feature is when ANY of the 
    shared storage in your cluster is not qualified for use as a Sun 
    Cluster quorum device. If your storage was purchased with your 
    cluster, it is qualified. Otherwise, check with your storage vendor 
    to determine whether your storage device is supported as Sun Cluster 
    quorum device.

    If you disable automatic quorum device selection now, or if you 
    intend to use a quorum device that is not a shared SCSI disk, you 
    must instead use scsetup(1M) to manually configure quorum once both 
    nodes have joined the cluster for the first time.

    Do you want to disable automatic quorum device selection (yes/no) [no]?  yes



    Is it okay to create the new cluster (yes/no) [yes]?  

    During the cluster creation process, sccheck is run on each of the 
    new cluster nodes. If sccheck detects problems, you can either 
    interrupt the process or check the log files after the cluster has 
    been established.

    Interrupt cluster creation for sccheck errors (yes/no) [no]?  yes


  Cluster Creation

    Log file - /var/cluster/logs/install/scinstall.log.3361

    Testing for "/globaldevices" on "node1" ... done
    Testing for "/globaldevices" on "node2" ... done

    Checking installation status ... done

    The Sun Cluster software is installed on "node1".
    The Sun Cluster software is installed on "node2".

    Starting discovery of the cluster transport configuration.

    The following connections were discovered:

        node1:bge2  switch1  node2:bge2
        node1:bge3  switch2  node2:bge3

    Completed discovery of the cluster transport configuration.

    Started sccheck on "node1".
    Started sccheck on "node2".

    sccheck completed with no errors or warnings for "node1".
    sccheck completed with no errors or warnings for "node2".


    Configuring "node2" ... done
    Rebooting "node2" ... 

I don't know what application are you installing, but you're not even at half the way. So don't get frustrated too early, try to get interested in it, and treat it as a valuable challenge. It really is. I'll try to help you if I will be able to, and other here willl do the same. Please share some feedback with us, as we do it in hope for our own development. Looking forwared to hear from you!

One more thing came to my mind overnight: it is important to note, that SC uses regular IP for inter-node communication, the addresses and subnets are pre-defined (may be changed) and are: 172.16.0.0/21.
As you can see the subnet is rather short resulting in large subnet beign used, the subnets behave like any other IP addresses on the system: they pop up in system's routing table and may ,,hide'' other routes. Please refer to:
Private Network (Sun Cluster Software Installation Guide for Solaris OS) - Sun Microsystems

Hi buddy,

Sorry for replying so so late. I have gone through all your check list and trying figure out what is blocking me in this. Also I faced some resource(One of the node went to other team) problem at my test lab after last communication.

Anyways, I've got both servers now and will try to configure from scratch. :slight_smile:

Thanks for your all guidance. Will present you my results soon.

Cheers