Solaris link aggregation not working as expected

Hi,
This is Solaris-10 x86 platform.
I am not able to ping gateway associated with aggr50001. I am not getting idea, where could be issue. Please advise.

# netstat -nr

Routing Table: IPv4
  Destination           Gateway           Flags  Ref     Use     Interface
-------------------- -------------------- ----- ----- ---------- ---------
default              172.31.xx.xx          UG        1         12 aggr62001
default              172.31.xx.xy         UG        1         14 aggr64001
default              172.31.xx.yy          UG        1       6182 aggr50001
default              172.31.yy.yy        UG        1         12 aggr66001
default              172.31.yy.yx          UG        1          5 aggr81001
default              172.31.yx.yx         UG        1         24 aggr60001
--
--
--
--
--
# dladm show-link
bnx0            type: non-vlan  mtu: 1500       device: bnx0
bnx1            type: non-vlan  mtu: 1500       device: bnx1
bnx2            type: non-vlan  mtu: 1500       device: bnx2
bnx3            type: non-vlan  mtu: 1500       device: bnx3
igb0            type: non-vlan  mtu: 1500       device: igb0
igb1            type: non-vlan  mtu: 1500       device: igb1
igb2            type: non-vlan  mtu: 1500       device: igb2
igb3            type: non-vlan  mtu: 1500       device: igb3
aggr1           type: non-vlan  mtu: 1500       aggregation: key 1
aggr2           type: non-vlan  mtu: 1500       aggregation: key 2
aggr150002      type: vlan 150  mtu: 1500       aggregation: key 2
aggr50001       type: vlan 50   mtu: 1500       aggregation: key 1
aggr55001       type: vlan 55   mtu: 1500       aggregation: key 1
aggr60001       type: vlan 60   mtu: 1500       aggregation: key 1
aggr62001       type: vlan 62   mtu: 1500       aggregation: key 1
aggr64001       type: vlan 64   mtu: 1500       aggregation: key 1
aggr66001       type: vlan 66   mtu: 1500       aggregation: key 1
aggr81001       type: vlan 81   mtu: 1500       aggregation: key 1
# 
# dladm show-aggr
key: 1 (0x0001) policy: L4      address: 3c:d9:2b:f9:20:5e (auto)
           device       address                 speed           duplex  link    state
           bnx1         3c:d9:2b:f9:20:5e         1000  Mbps    full    up      attached
           igb2         f4:ce:46:a7:df:ba         0     Mbps    half    down    standby
key: 2 (0x0002) policy: L4      address: 3c:d9:2b:f9:20:5c (auto)
           device       address                 speed           duplex  link    state
           bnx0         3c:d9:2b:f9:20:5c         1000  Mbps    full    up      attached
           igb3         f4:ce:46:a7:df:bb         1000  Mbps    full    up      attached
#
# dladm show-dev
bnx0            link: up        speed: 1000  Mbps       duplex: full
bnx1            link: up        speed: 1000  Mbps       duplex: full
bnx2            link: unknown   speed: 0     Mbps       duplex: unknown
bnx3            link: unknown   speed: 0     Mbps       duplex: unknown
igb0            link: unknown   speed: 0     Mbps       duplex: half
igb1            link: unknown   speed: 0     Mbps       duplex: half
igb2            link: down      speed: 0     Mbps       duplex: half
igb3            link: up        speed: 1000  Mbps       duplex: full
#

HERE IS SETUP OF WORKING SERVER -->
# dladm show-aggr
key: 1 (0x0001) policy: L4      address: 3c:d9:2b:f9:20:12 (auto)
           device       address                 speed           duplex  link    state
           bnx1         3c:d9:2b:f9:20:12         1000  Mbps    full    up      attached
           igb2         f4:ce:46:a7:e6:26         1000  Mbps    full    up      attached
key: 2 (0x0002) policy: L4      address: 3c:d9:2b:f9:20:10 (auto)
           device       address                 speed           duplex  link    state
           bnx0         3c:d9:2b:f9:20:10         1000  Mbps    full    up      attached
           igb3         f4:ce:46:a7:e6:27         1000  Mbps    full    up      attached
# dladm show-dev
bnx0            link: up        speed: 1000  Mbps       duplex: full
bnx1            link: up        speed: 1000  Mbps       duplex: full
bnx2            link: unknown   speed: 0     Mbps       duplex: unknown
bnx3            link: unknown   speed: 0     Mbps       duplex: unknown
igb0            link: unknown   speed: 0     Mbps       duplex: half
igb1            link: unknown   speed: 0     Mbps       duplex: half
igb2            link: up        speed: 1000  Mbps       duplex: full
igb3            link: up        speed: 1000  Mbps       duplex: full
#

Probably I would like to make igb2 as full duplex, but there is no igb2 interface in ifconfig. All i can see are aggr interfaces. How will I see underlying interface of igb2 ?
Thanks

Try to access the interface with ndd.
First list the supported values.

ndd /dev/igb0 \?

Or

ndd /dev/igb \?

Actually the default should be autonegotiation, also on the other side (LAN switch).

/dev/igb2 as well as /dev/igb0, both shows all below supported values and I tried changing from half duplex to full duplex, but igb2 still shows down and standby.

# ndd /dev/igb2 \?
?                             (read only)
mtu                           (read and write)
min_allowed_mtu               (read only)
max_allowed_mtu               (read only)
adv_autoneg_cap               (read and write)
adv_1000fdx_cap               (read and write)
adv_1000hdx_cap               (read only)
adv_100fdx_cap                (read and write)
adv_100hdx_cap                (read and write)
adv_10fdx_cap                 (read and write)
adv_10hdx_cap                 (read and write)
adv_100T4_cap                 (read only)
link_status                   (read only)
link_speed                    (read only)
link_duplex                   (read only)
autoneg_cap                   (read only)
pause_cap                     (read only)
asym_pause_cap                (read only)
1000fdx_cap                   (read only)
1000hdx_cap                   (read only)
100fdx_cap                    (read only)
100hdx_cap                    (read only)
10fdx_cap                     (read only)
10hdx_cap                     (read only)
lp_autoneg_cap                (read only)
lp_pause_cap                  (read only)
lp_asym_pause_cap             (read only)
lp_1000hdx_cap                (read only)
lp_1000fdx_cap                (read only)
lp_100fdx_cap                 (read only)
lp_100hdx_cap                 (read only)
lp_10fdx_cap                  (read only)
lp_10hdx_cap                  (read only)
link_autoneg                  (read only)
tx_copy_thresh                (read and write)
tx_recycle_thresh             (read and write)
tx_overload_thresh            (read and write)
tx_resched_thresh             (read and write)
rx_copy_thresh                (read and write)
rx_limit_per_intr             (read and write)
intr_throttling               (read and write)
adv_pause_cap                 (read only)
adv_asym_pause_cap            (read only)
# ndd -set /dev/igb2 adv_100hdx_cap 0
# ndd -set /dev/igb2 adv_100fdx_cap 1
# dladm show-aggr
key: 1 (0x0001) policy: L4      address: 3c:d9:2b:f9:20:5e (auto)
           device       address                 speed           duplex  link    state
           bnx1         3c:d9:2b:f9:20:5e         1000  Mbps    full    up      attached
           igb2         f4:ce:46:a7:df:ba         0     Mbps    half    down    standby
key: 2 (0x0002) policy: L4      address: 3c:d9:2b:f9:20:5c (auto)
           device       address                 speed           duplex  link    state
           bnx0         3c:d9:2b:f9:20:5c         1000  Mbps    full    up      attached
           igb3         f4:ce:46:a7:df:bb         1000  Mbps    full    up      attached
#

There are multiple gateways on this server. Concerned gateway is not pingable and that is going via aggr50001

# traceroute 172.31.12.1
traceroute: Warning: Multiple interfaces found; using 172.31.12.20 @ aggr50001
traceroute to 172.31.12.1 (172.31.12.1), 30 hops max, 40 byte packets
 1  * * *
 2  * * *
 3  * * *
 4  * * *
 5  * * *
 6  * * *
 7  *

adv_100fdx_cap?
I think you want adv_1000fdx_cap
Also ensure that adv_autoneg_cap is set.

1 Like

Sorry, my mistake
I have set that now, but no luck

# ndd -get /dev/igb2 adv_autoneg_cap
1
# ndd -get /dev/igb2  adv_1000fdx_cap
1
# ndd -get /dev/igb2 adv_100fdx_cap
0
# dladm show-aggr
key: 1 (0x0001) policy: L4      address: 3c:d9:2b:f9:20:5e (auto)
           device       address                 speed           duplex  link    state
           bnx1         3c:d9:2b:f9:20:5e         1000  Mbps    full    up      attached
           igb2         f4:ce:46:a7:df:ba         0     Mbps    half    down    standby
key: 2 (0x0002) policy: L4      address: 3c:d9:2b:f9:20:5c (auto)
           device       address                 speed           duplex  link    state
           bnx0         3c:d9:2b:f9:20:5c         1000  Mbps    full    up      attached
           igb3         f4:ce:46:a7:df:bb         1000  Mbps    full    up      attached
#

For a test, set adv_autoneg_cap is set to 0.
If only adv_1000fdx_cap is enabled then it will be enforced.
If the enforcement works, install a patch for the igb driver (reboot to activate), then enable adv_autoneg_cap again.
If not, then something is wrong with the LAN switch (or the cable).

I just checked with Network team and they confirmed that link is fine on switch side, but Link aggregation (LACP) is showing down.

Then check LACP status with dladm show-aggr -L ...

# dladm show-aggr -L
key: 1 (0x0001) policy: L4      address: 3c:d9:2b:f9:20:5e (auto)
                LACP mode: active       LACP timer: short
    device    activity timeout aggregatable sync  coll dist defaulted expired
    bnx1      active   short   yes          yes   yes  yes  no        no
    igb2      active   short   yes          no    no   no   yes       no
key: 2 (0x0002) policy: L4      address: 3c:d9:2b:f9:20:5c (auto)
                LACP mode: active       LACP timer: short
    device    activity timeout aggregatable sync  coll dist defaulted expired
    bnx0      active   short   yes          yes   yes  yes  no        no
    igb3      active   short   yes          yes   yes  yes  no        no
# 

Is this you wanted me to check ?

You see that LACP is not working on igb2.
The logical consequence if the igb2 link is down (as you confirmed already).
I fear there is broken hardware. Did you exchange the cable(s) between the Solaris box and the LAN switch?

1 Like

Yes, it was cable. Replaced it and now things are looking good. Thanks for pointing out towards solution.