Solaris 8: Multiple primary interfaces connected to the same network

aeg · June 6, 2011, 3:53pm

Hello,

I have a machine with Solaris 8, and it has multiple interfaces that are connected to the same network which means they all have metric 0 (1 hop) to the default gateway.

assume:

e1000g0: 10.1.1.70
e1000g2: 10.1.1.72
e1000g4: 10.1.1.74
e1000g5: 10.1.1.76
gateway: 10.1.1.65 (Cisco Router)

However, it seems like despite the fact that they have a direct connection, they seem to be using e1000g0 to access the 10.1.1.0 network to get to the default gateway and then to anywhere else.

When I send a ping to say, 10.1.1.74 (IP of e1000g4) and capture packets on e1000g0, I see the "echo reply" messages going out of it as opposed to e1000g4 even though e1000g4 is the one receiving the "echo request". This should not happen and these should be completely independent as they should all be advertising a 1 hop to that network

This gets even more confusing when I go into the Cisco router and run the command: "show mac address-table" where only the MAC address of e1000g0 is shown for the switch port it's connected to, but not for the other interfaces which are connected to the switch. Yes, all ports are active (no shut) and are pingable.

Also, the odd thing is that ALL of these individual MACs show up in the router ARP table when the machine comes up, however after sending a ping to one of them, after a certain expiry or whatever period, the MACs disappear from the router ARP table and only the MAC for e1000g0 shows up. The arp table of the solaris machine however shows all the relevant MACs of each port of the router that it's physically connected to (This is actually a Cisco Switch with the advanced IP services imagine and L3 routing turned on)

The routing table inside the machine also looks good and clearly shows each interface itself being the gateway to the 10.1.1.0 network.

I need to somehow assign all these interfaces equal priority and make them understand that they're physically connected to the 10.1.1.0 network and there's no need to go through e1000g0 to get to it.

This is causing a lot of problems as eventually all traffic will end up going through the e1000g0 interface and that will become a bottle neck.

Please help
Thanks in advance

DGPickett · June 6, 2011, 3:58pm

Configure IPMP Load Balancing & Resilience in Sun Solaris | Sun Solaris System Admin

aeg · June 6, 2011, 4:05pm

Thanks DGPickett,

Yes, I myself was thinking of doing that but what's confusing is that this used to work fine before and there were no issues. Each interface was aware of its direct connection to the network and "minded their own business". All of a sudden, I've started to notice this bizarre behaviour. My only problem with configuring IPMP is what if the new virtual interface isn't recognized by this proprietary software that is running on it. This software is extremely inflexible and intolerant.

DGPickett · June 6, 2011, 4:34pm

There are layers and layers: Solaris tip of the week: IPNAT load balancer (greetings from network.com)

aeg · June 6, 2011, 5:28pm

Doesn't seem to help. This proprietary software comes with its own twist of the ipf/ipnat which overwrites any changes I make to the config file whenever I start the software. This has to be done at some routing level or some solaris config file level but I am still confused as to why, when this used to work and nothing on that level has changed. It should just use the interface that the data is being sent out of as that's the most direct path to the gateway.

fpmurphy · June 6, 2011, 10:07pm

So, then, what did you or somebody else change on the system?

aeg · June 6, 2011, 11:06pm

That's the problem. You'd think that, but we didn't do anything special the last time we configured it because any networking theory will tell you that if you have NICs with an equal (or directly connected) metric to a network, they all have an equal priority by default and will try and access the network themselves. The only thing we did was change the IPs as we moved data centers and ISPs and now all of a sudden everything is going through e1000g0 and that's the only MAC the router sees regardless of them all being physically connected to the router. I think we need to run some protocol on this machine to advertise their metrics or something. Also, Like I said, when the machines reboot OR we restart the networking (with S72inetsvc) the router arp table sees all the MACs on their respective switchports, but then they disappear after a little while

DGPickett · June 7, 2011, 11:12am

Disappearing when not used is an arp cache feature, in case you are changing NICs on an host. I think 5 minutes is pretty usual, esp for Windows. This makes things tricky/slow if you design a failover that puts a new MAC on an IP.

If you pull the cable on the first NIC, does it move over gracefully?

If you provide saturating load, does it ever use the other NICs?

aeg · June 7, 2011, 2:09pm

Ok, that makes sense, so maybe that is only happening since the traffic is only going out of e1000g0. Thanks for that. So now I need to figure out how to make the traffic from each NIC go out of it directly.

I am not physically where the machine is so I simulated this by shutting down the switch port to which e1000g0 is connected and the machine loses connection to the internet. I am unable to then receive any ping response from any of the other interfaces, although doing a packet capture on the other interfaces I can see the "ping request" come in. The same applies to when I try to ping out of any specific interface, it fails as I'm suspecting the pings try to get out of e1000g0 but it's shutdown.

I don't know how to simulate that

DGPickett · June 7, 2011, 3:19pm

Network link saturation is usually 2-3 big file transfers, unless the disk is slower (first cat the file to dev null to get it in cache?).

It seems like the load balancing originally set up was linked to the old IP addresses, and you need to redo whatever that was. This looks applicable: http://download.oracle.com/docs/cd/E19253-01/816-4554/ipconfig-22a/index.html

What does routeadm show?

aeg · June 7, 2011, 3:31pm

I see...those will have to be really large files(?) to saturate these GigE links?

The weird thing is that this machine has moved 2-3 times already and have had to change IPs on it every single time and the last time I did it, everything went smooth and was a non-issue. This time, I did the same thing but am faced with this issue and I don't remember doing anything special. I just assumed that when the interfaces are connected to the network directly, they pass traffic directly

You're right, this DOES look relevant. Unfortunately this is a restricted version of Solaris delivered by the vendor of this appliance and does not have svcadm or routeadm as commands. Maybe I'll have to see if I can find them on the net somewhere but finding anything for Solaris-8 is becoming harder and harder.

So now I have to figure out how to do what this article says without the svcadm or routeadm commands. I know there's a file that stores the ip forwarding info etc. Can't remember which one, also ndd might help?

DGPickett · June 7, 2011, 3:52pm

Maybe you can find hints in open solaris source code.

Big enough files so the first is still moving when the others start moving. Content is not important, but no compression on the sending tool.