Guest LDOMS on same subnet cant ping eachother

selectstar · March 1, 2015, 7:02pm

Hi all,

New to this forum.

I have just been reading through a historical thread about some issues with IPMP.

Some tips from "Peasant" where very useful. Please see below

"Just couple of more hints regarding VM.
For VDS, use one VDS - one guest LDOM, don't put everything in primary-vds.
Disable extended-mappin-space everywhere since i noticed sometimes live migration fails with this on.
You might want to disable INTER-VNET-LINK (to off)
I had a situation when network between two guests, that reside same physical machine on the same subnet, just stops working.
I have ran some internal test with those two options on/off (which should improve network throughput), and didn't notice any
performance gains, only experienced issue above.  Depends on which patchset you are, perhaps those issues are now fixed.
Regards
Peasant"

The issue I am having is that two Guest LDOMS which have IP addresses in h same subnet/range are unable to ping each other. They can ping all other LDOM Guets, control/ Service etc. but not each other.

Actually, before i switched off INTER-VNET-LINK=off then they would just crash.

We have two IO domains with aggregated ports using IPMP

Please let me know what info you require and I hope you can help. Some details below:

First Guest Domain = Sneezy

primary
MAC
    00:10:e0:46:32:e6
VSW
    NAME             MAC               NET-DEV   ID   DEVICE     LINKPROP   DEFAULT-VLAN-ID PVID VID                  MTU   MODE   INTER-VNET-LINK
    primary-vsw50    00:14:4f:f9:c0:79 aggr0     0    switch@0              1               1    50                   1500         off
------------------------------------------------------------------------------
NAME
secondary
MAC
    00:14:4f:fb:51:13
VSW
    NAME             MAC               NET-DEV   ID   DEVICE     LINKPROP   DEFAULT-VLAN-ID PVID VID                  MTU   MODE   INTER-VNET-LINK
    secondary-vsw50  00:14:4f:fb:21:17 aggr0     0    switch@0              1               1    50                   1500         off
------------------------------------------------------------------------------
NAME
sneezy
MAC
    00:14:4f:f8:ce:95
NETWORK
    NAME             SERVICE                     ID   DEVICE     MAC               MODE   PVID VID                  MTU   LINKPROP
    vnet0            primary-vsw50@primary       0    network@0  00:14:4f:f9:1a:0b        50                        1500  phys-state
    vnet1            secondary-vsw50@secondary   1    network@1  00:14:4f:fa:92:25        50                        1500  phys-state
admin@martell:/$

Second guest domain on different host= sleepy

admin@lannister:~$ ldm list -o network
NAME
primary
MAC
    00:10:e0:46:30:a2
VSW
    NAME             MAC               NET-DEV   ID   DEVICE     LINKPROP   DEFAULT-VLAN-ID PVID VID                  MTU   MODE   INTER-VNET-LINK
    primary-vsw50    00:14:4f:fa:c3:20 aggr0     0    switch@0              1               1    50                   1500         off
------------------------------------------------------------------------------
NAME
secondary
MAC
    00:14:4f:fa:8f:07
VSW
    NAME             MAC               NET-DEV   ID   DEVICE     LINKPROP   DEFAULT-VLAN-ID PVID VID                  MTU   MODE   INTER-VNET-LINK
    secondary-vsw50  00:14:4f:f9:a8:69 aggr0     0    switch@0              1               1    50                   1500         off
------------------------------------------------------------------------------
NAME
sleepy
MAC
    00:14:4f:f8:ce:95
NETWORK
    NAME             SERVICE                     ID   DEVICE     MAC               MODE   PVID VID                  MTU   LINKPROP
    vnet0            primary-vsw50@primary       0    network@0  00:14:4f:fa:92:25        50                        1500  phys-state
    vnet1            secondary-vsw50@secondary   1    network@1  00:14:4f:f9:1a:0b        50                        1500  phys-state

admin@lannister:~$ ldm -V
Logical Domains Manager (v 3.1.0.1)
        Hypervisor control protocol v 1.11
        Using Hypervisor MD v 1.4
System PROM:
        Hostconfig      v. 1.4.0        @(#)Hostconfig 1.4.0 2014/04/04 17:51
        Hypervisor      v. 1.13.0.a     @(#)Hypervisor 1.13.0.a 2014/04/08 14:05
        OpenBoot        v. 4.36.0       @(#)OpenBoot 4.36.0 2014/04/04 15:50

Other useful information that may help.

admin@lannister:~$ dladm show-aggr
LINK              MODE  POLICY   ADDRPOLICY           LACPACTIVITY LACPTIMER
aggr0             trunk L2       auto                 active       short
admin@lannister:~$ dladm show-vlan
LINK                VID      OVER                FLAGS
aggr5000            50       aggr0               -----

admin@lannister:~$ dladm
LINK                CLASS     MTU    STATE    OVER
net0                phys      1500   up       --
net1                phys      1500   up       --
net2                phys      1500   up       --
aggr5000            vlan      1500   up       aggr0
aggr0               aggr      1500   up       net0 net1

admin@lannister:~$ dladm show-phys
LINK              MEDIA                STATE      SPEED  DUPLEX    DEVICE
net0              Ethernet             up         10000  full      ixgbe0
net1              Ethernet             up         10000  full      ixgbe1
net2              Ethernet             up         10000  full      vsw0

Many thanks

---------- Post updated at 07:01 PM ---------- Previous update was at 01:25 PM ----------

This has now been solved.

It was due to duplicate ldom mac address (vnet mac addresses). Needed to manually change those using ldm set-vnet mac-addr=

Peasant · March 2, 2015, 2:09am

That is strange, you should have no need to setup mac addresses 'by hand'.
Have you setuped some of your macs by hand and other automatically ?

Few more hints to avoid problems in future.
If you have multiple physical hosts (hypervisors) and one or more ldoms can be migrated / imported to other, be sure to export the configuration of everything daily into a file

ldm list-services -l ldom# this will provide additional information, especially HOSTID.
ldm -x list-constraints ldom # for import of ldom on another physical machine.

hostid is important here cause a change can happen if you don't migrate the machine (ldm migrate) to a new host, but import it from configuration file, can cause for zpools not to import during boot (rpool should import, but every other zpool will require manual work.)

Hope that helps

Regards
Peasant.

selectstar · March 2, 2015, 10:10am

Hi peasant

All mac addresses have been setup automatically up to now. I know there it is supposed to do some form of subnet scan for used MAC address but clearly this is not doing that. We have 4 T5-2 servers.

Many thanks for the extra tips!

Peasant · March 2, 2015, 3:03pm

Be sure to check that all 4 hypervisor are in the same subnet for their network (primary domain network not ldoms).

Per documentation, each domain manager sends a multicast with MAC he wants to give to new ldom, if it is busy on other domain manager will respond and it will iterate to the next.

Collision can occur if one or more logical domain managers (physical hosts)are down, and you are creating ldom on a working domain manager.
In theory, since 2 hosts are down and are not aware of new ldom being created on another host, but i have never experienced this in practice.

selectstar · March 2, 2015, 5:58pm

All four T5-2 servers are on the same Subnet. One thing to mention is that all four have two IO/Service domains. Again the alternate IO domain is also on the same subnet.

Can you please elaborate a little on what you mentioned regarding taking XML backups of each LDOM every night please?

Is there anyway to check if the multicast is happening or not, as it clearly is not I think?

Many thanks

Peasant · March 3, 2015, 2:57am

Regarding backups, you just need to run the commands i gave you with standard unix redirection > to a file.

Those are plain XML files with your configuration, put it somewhere you can access it @ all times (like small FC zpool, or a tape backup)

Think about it, if you put gzip compression or even better deduplication on zfs fileset sized 1 GB, you can probably save configurations there for the next 10 years daily without deleting anything

You should be able to tcpdump for multicast traffic on your network, there are examples online.
Unfortunately, since i'm on leave, i cannot offer practical assistance with the dump for the next seven days.

There is also some code provided by Oracle to do the same, probably this is the way how ldmd communicate to each other:
Discovering Systems Running the Logical Domains Manager - Oracle VM Server for SPARC 3.0 Administration Guide

Hope that helps
Regards
Peasant.

selectstar · March 13, 2015, 4:03pm

Peasant, many thanks for the comment. Its a huge help.

I am somwhat struggling to identify how to fix this issue I have with conflicting IP addresses. I have read the link you posted but am struggling how to apply it.

I created a new LDOM today and yet again it had a conflicting mac address with a LDOM on a different host, but on the same subnet ip range.

Peasant · March 14, 2015, 1:55am

This should not happen if everything is configured properly.

I checked your initial output more carefully (sorry for that )

What looks wrong to me is that you are using L2 aggregation (aggro0 interface), and you have created from that interface two virtual switches, then you used those interfaces to create ipmp group inside ldom.

I don't think that is supported configuration, looks kinda silly

Since you have aggregated two interfaces (net0 and net1) which must be connected to the same physical switch, there is no need to use IPMP inside LDOM (guest domain, i don't think this is supported configuration at all, possibly why you are having mac collisions) or create multiple virtual switches over one interface (aggr0).

This schematic should be more illuminating :

Primary domain (hypervisor - bare metal)
---> net0 <> net1 [aggr0 L2] ---> primary-vsw50 (on primary, created using aggr0, add vsw command) ---> vnet0 for guest ldom1, ldom2 (add-vnet command)

Only one vnet is enough, since if net0 fails, all you will loose is bandwidth from one interface.

No need to tag the interfaces on the hypervisor os level (aggr5000, dladm create-vnet), since this is done for LDOMS on the vsw/vnet level (PVID,VID).
This should work, but it is a legacy way to implement vlan tagging in LDOMS.

As for bare metal domains (primary,secondary), let me offer a short explanation of domains as i understand it...
For instance, you have sparc t4-2 with two sockets, two 4 port network cards and two 2 port FC card.

You can create two hardware domains - primary and secondary, in which the actual I/O hardware is splited between those two domains (each has one PCI card and one FC card and one CPU socket and memory ).

Now you have a situation that you have one t4-2 sparc which is actually two machines separated on hardware level. So all LDOMS created on primary domain will use its resources (CPU,PCI - half of them) and ldoms on secondary will use its resources (other half)

Basically, if one socket fails due to hardware failure, only the primary domain and guest ldoms on them will fail, while secondary and guest ldoms on it will continue to run.
Those setups complicate things considerably and are done on machines which have more resources in redundant matter (like 4 cards or 4 sockets, 2 phys cards per domain for redundancy etc.)

For your setup i guess you need (keep it simple - as per scheme in the begining) :

One primary domain (bare metal)
One vsw created on top of the aggr0 interface in primary domain.
One vnet interface added to LDOM from primary-vsw on primary domain.
One VDS (virtual disk service) in primary domain per guest ldom (sneezy-vds@primary, otherguestldom-vds@primary etc.) in which you add disks for ldoms.

Hope that clears things out.

Regards
Peasant.

selectstar · March 14, 2015, 11:08am

Hi Peasant, again, I cant thank you enough for your input.

So what we actually have is a t5-2 which has two sockets, 2x two port FC Cards and 4x gigabit Ethernet ports.

As you said the machine is split right down the middle with each root complex owning exactly half of the hardware including local Hard drives.

What we have is:
1x Primary Control domain (Control, IO, Service). Obviously all LDOMS are managed from the Primary.

1x Secondary (Or what some people call 'Alternate') IO, Service domain which can see bare metal Storage.

Im sure im telling you what you already know but it help me explain it out The idea of us have two IO, service domains (Priamry and Secondary) is that we can actually take one of them down (i.e for patching) and all Guest LDOMS will continue to run, route traffic in/out, see LUNS etc.

And this is the case. When i init 6 or shutdown the primary LDOM, all Guests continue to operate via the Secndary (Alternate) Domain. And vice-a-versa.

So when I create a guest LDOM, i make sure to create two VNET's, one pointing to the Primary VSW and the other to the Secondary VSW. And when creating new LDOMS, i alterante which switch vnet0 point to so that all traffic does always go through one switch.

And this is the same principle for DISKS, i use multipathing groups (MPGROUP) to ensure that guest can see LUNs from both IO, SERVICE domains.

I think you are correct about the IPMP guest settings, I am just reading up more about that.

I also don't pretend to completely understand the difference between the trunk policies (L2, L3 etc.). I am also doing some more reading on that.

FYI, we also have some T5-2 servers which not only have 2x two port FC cards but also 2x two port ehternet cards in addition to the 4x on board ethernet ports. These serers follow the same principle as the on i use in the original post, but onviously each root complex has 4 ethernet ports each for the trunk.

Peasant · March 15, 2015, 4:24am

You do have more then one t5-2 machine ?

If you have only one root complex on each (only primary domain) and take care you don't over commit resources (cpu / mem), you should be able to live migrate a ldom from one machine to another.
All you have to watch is that the names are the same for all the virtual devices and the backend devices are the same (naturally)
I am not sure if you can have 2 root complexes (primary / secondary) and do a live migrate of guest ldom inside the same machine.

That is not a use case at all, but i could be wrong (i have never made such setup).

Use case for root complexes is two have complete hardware separation of, for instance, production and test and it is used on machines with more sockets, more cards etc.

Take this example - sneezy is a production ldom and sloppy is a test ldom.
We are using only primary domain only one root complex logically separated production and test

Both t5-2 have same names and configuration

FC (2 FC 2 PORT cards)

1 port from each FC card is for production usage (zoned on switch, production host group on storage)
1 port from each FC card is for test usage (zoned on switch, test host group on storage)

This way if one FC cards dies, production and test will continue to operate.

You just prefix when create a VDS (virtual disk services) with prod or test depending.

So you have sneezy-prodvds and sloppy-testvds on both sparcs in primary domains, with disks added to them according to layout above (test and prod host groups and paths).

Remember you have freedom here to add both to any ldom (test or production disk), only naming policy is telling you to which VDS you will add each disk.

NETWORK (2 LAN 4 PORT cards, 8 ports total, dladm show-phys)

Now you make a choice, will you use aggregation, ipmp or dlmp

Example here is with aggregation (aggr0,aggr1).

You take 2 ports from one card and 2 ports from other card which lead to the same LAN switch and create an aggr0 interface,then create a production-vsw from that interface (with production vlan tags configured pvid/vid)
You add vnet to sneezy ldom from production-vsw (or any other production ldom)

From other network ports (2 from each card remains), you create an aggr1 interface, then create a test-vsw from that interface (with test vlan tags configured on vsw pvid/vid)
You add vnet to sloppy ldom from test-vsw (or any other test ldom).

This way if one card dies, you will loose 2 production and 2 test paths, but both test and production will continue to operate on lower bandwidth (2 x instead of 4 x)

VSW and VDS are named the same on both sparc machines primary domain (this is a requirement for migration (live or cold).

Now you have sneezy production ldom, with production-vsw --> vnet for networking and disk added via proper FC path and added to sneezy-prodvds and sneezy ldom.

Also you have sloppy test ldom, with test-vsw --> vnet for networking and disk added via proper FC path and added to sloppy-testvds and sloppy ldom.

PRIMARY DOMAIN network with tagging on both sparcs :

I recommend using a separate vlan for primary domain IP addressing (control domain can be isolated on vlan layer on network for security reasons)

Since you have tagging on switch and aggr0/1 interfaces, you will have to create a tagged interface for primary domain.
dladm create-vlan -l aggr1 -v <your vlanid> vlan-link
If you don't provide the vlan-link it will create a aggr1<yourvlanid> interface.

This is the interface you will use to create ip address for primary domain on both machines. We are using aggr1 here (test network) for live migration of all ldoms (both production and test), but you are free to choose any (0 or 1) depending on the network topology and bandwidth required.

Now you can issue a test live migration from host1 to host2 with command
ldm migrate -n sloppy host2 # -n switch is just to check if migration will work and it is a great way to check if configuration on both physical machines are the same.

Final result that you can migrate any production or test guest LDOM to any sparc t5-2 machine without of interrupting the service (live migrate) or cold migrate (with ldom down), while having LAN and FC resources seperated for production and test and having more or less "keep it simple" configuration

Be sure firmware levels are the same on all your sparc t5-2 machines for live migrate to work (cold migrate will not have this limit).

Hope that helps.

Regards
Peasant.