How can I test link aggregation?

Hi,
I have Solaris-10 server and link aggregation is configured on this in below way

# dladm show-aggr
key: 1 (0x0001) policy: L4      address: 3c:d9:2b:ee:a8:a (auto)
           device       address                 speed           duplex  link    state
           bnx1         3c:d9:2b:ee:a8:a          1000  Mbps    full    up      attached
           igb2         f4:ce:46:a7:eb:92         1000  Mbps    full    up      attached
key: 2 (0x0002) policy: L4      address: 3c:d9:2b:ee:a8:8 (auto)
           device       address                 speed           duplex  link    state
           bnx0         3c:d9:2b:ee:a8:8          1000  Mbps    full    up      attached
           igb3         f4:ce:46:a7:eb:93         1000  Mbps    full    up      attached
# dladm show-link
bnx0            type: non-vlan  mtu: 1500       device: bnx0
bnx1            type: non-vlan  mtu: 1500       device: bnx1
bnx2            type: non-vlan  mtu: 1500       device: bnx2
bnx3            type: non-vlan  mtu: 1500       device: bnx3
igb0            type: non-vlan  mtu: 1500       device: igb0
igb1            type: non-vlan  mtu: 1500       device: igb1
igb2            type: non-vlan  mtu: 1500       device: igb2
igb3            type: non-vlan  mtu: 1500       device: igb3
aggr1           type: non-vlan  mtu: 1500       aggregation: key 1
aggr2           type: non-vlan  mtu: 1500       aggregation: key 2
aggr150002      type: vlan 150  mtu: 1500       aggregation: key 2
aggr50001       type: vlan 50   mtu: 1500       aggregation: key 1
aggr55001       type: vlan 55   mtu: 1500       aggregation: key 1
aggr60001       type: vlan 60   mtu: 1500       aggregation: key 1
aggr62001       type: vlan 62   mtu: 1500       aggregation: key 1
aggr64001       type: vlan 64   mtu: 1500       aggregation: key 1
aggr66001       type: vlan 66   mtu: 1500       aggregation: key 1
aggr81001       type: vlan 81   mtu: 1500       aggregation: key 1

# dladm show-dev
bnx0            link: up        speed: 1000  Mbps       duplex: full
bnx1            link: up        speed: 1000  Mbps       duplex: full
bnx2            link: unknown   speed: 0     Mbps       duplex: unknown
bnx3            link: unknown   speed: 0     Mbps       duplex: unknown
igb0            link: unknown   speed: 0     Mbps       duplex: half
igb1            link: unknown   speed: 0     Mbps       duplex: half
igb2            link: up        speed: 1000  Mbps       duplex: full
igb3            link: up        speed: 1000  Mbps       duplex: full
#

There will be switch replacement, so one by one, link will go down from one side. Before that activity, is there any way to check/test, if server will work fine, if one side goes down ? In same way, as used to check by if_mpadm -d in ipmp.

Thanks

The only way I know of to properly test it is to either physically pull a cable (logically if it's a virtual server) or to down the network interface card. Obviously you would down the physical card that supports one path or the aggregated link. You should be able to get statistics about the aggregated link to show you what is in use. A path should die and then recover when you turn it back on.

Of course, this introduces risk if it doesn't work, so always make sure you have a way to re-enable it all quickly.:rolleyes:

Kind regards,
Robin

1 Like

Thank Robin,
Thats what I thought. I will plan it out

dladm show-aggr

Check correct "speed" and "duplex mode" and there should be "link up".

If you use LACP (I think you must use LACP - everything else I have seen being faulty), then check with

dladm show-aggr -L

The output of a working LACP looks like this:

    device    activity timeout aggregatable sync  coll dist defaulted expired
    xyz0      passive  long    yes          yes   yes  yes  no        no     
    xyz1      passive  long    yes          yes   yes  yes  no        no     

In Solaris 10 the timer defaults to "short", you should change it to "long" unless told otherwise by the vendor of the connected LAN switches.
"passive" or "active" does not mattter (the connected LAN switch must be "active").
"policy" does not matter.
The "sync" "coll" and "dist" must all be "yes" (otherwise it does not cooperate with the connected LAN switch).

That's all theory.
The practise is: pull a cable and check connectivity.

In my output, timeout is mentioned as short.
How will it affect, if I do not change it to long ?

# dladm show-aggr -L
key: 1 (0x0001) policy: L4      address: 3c:d9:2b:ee:a8:a (auto)
                LACP mode: active       LACP timer: short
    device    activity timeout aggregatable sync  coll dist defaulted expired
    bnx1      active   short   yes          yes   yes  yes  no        no
    igb2      active   short   yes          yes   yes  yes  no        no
key: 2 (0x0002) policy: L4      address: 3c:d9:2b:ee:a8:8 (auto)
                LACP mode: active       LACP timer: short
    device    activity timeout aggregatable sync  coll dist defaulted expired
    bnx0      active   short   yes          yes   yes  yes  no        no
    igb3      active   short   yes          yes   yes  yes  no        no
#

Ask your LAN switch vendor!
Most vendors support "short" if both NICs are connected to one LAN switch, but not if connected to two different LAN switches.
In the worst case a fail-over does not work. Do the "pull the cable" test at least!

Thanks.
I was able to find a server, on which I can do "pull the cable test".
From below output, it shows that bnx1 cable was pulled out. since aggr is configured igb2 took the traffic and all link sustained. But first line shows "address: 3c:d9:2b:f9:20:5e (auto)" and this MAC address is for bnx1.
Since igb2 took all the traffic now and its mac is f4:ce:46:a7:df:ba , should it not show in auto ? Sorry, I am confused in understanding this.

# dladm show-aggr
key: 1 (0x0001) policy: L4      address: 3c:d9:2b:f9:20:5e (auto)
           device       address                 speed           duplex  link    state
           bnx1         3c:d9:2b:f9:20:5e         0     Mbps    half    down    standby
           igb2         f4:ce:46:a7:df:ba         1000  Mbps    full    up      attached
key: 2 (0x0002) policy: L4      address: 3c:d9:2b:f9:20:5c (auto)
           device       address                 speed           duplex  link    state
           bnx0         3c:d9:2b:f9:20:5c         1000  Mbps    full    up      attached
           igb3         f4:ce:46:a7:df:bb         1000  Mbps    full    up      attached
#
# dladm show-aggr -L
key: 1 (0x0001) policy: L4      address: 3c:d9:2b:f9:20:5e (auto)
                LACP mode: active       LACP timer: short
    device    activity timeout aggregatable sync  coll dist defaulted expired
    bnx1      active   short   yes          no    no   no   yes       no
    igb2      active   short   yes          yes   yes  yes  no        no
key: 2 (0x0002) policy: L4      address: 3c:d9:2b:f9:20:5c (auto)
                LACP mode: active       LACP timer: short
    device    activity timeout aggregatable sync  coll dist defaulted expired
    bnx0      active   short   yes          yes   yes  yes  no        no
    igb3      active   short   yes          yes   yes  yes  no        no
#

I don't know how the fail-over case looks in detail.
The main point is: connectivity is still there, and, once you put the cable back, the normal redundant state comes back.

1 Like

Got it.. Thanks