Ethernet card is down

This issue started to occur on the server every two weeks, sometimes every three days. When I restart the server, the problem resolves, and I can access it again. However, I have to restart the server using iLO. When I check the logs, I can't determine the cause. What would you recommend checking?

smad[1712]: [INFO  ]: AgentX trap received
smad[1712]: [NOTICE]: AgentX trap CPQNIC (.1.3.6.1.6.3.1.1.4.1.0:.1.3.6.1.4.1.232.0.18011)
kernel: bnxt_en 0000:10:00.0 ens1f0np0: NIC Link is Down
smad[1712]: [INFO  ]: AgentX trap received
smad[1712]: [NOTICE]: AgentX trap CPQNIC (.1.3.6.1.6.3.1.1.4.1.0:.1.3.6.1.4.1.232.0.18012)
smad[1712]: [NOTICE]: IML received: 171 bytes
smad[1712]: [ALERT ]: CRITICAL: Broadcom P210tep NetXtreme-E Dual-port 10GBASE-T Ethernet PCIe Adapter Connectivity status changed to Link Failure for adapter in slot 1, port 1
smad[1712]: [INFO  ]: Log the IML info to syslog
kernel: bnxt_en 0000:10:00.0 ens1f0np0: NIC Link is Up, 1000 Mbps full duplex, Flow control: ON - receive
kernel: bnxt_en 0000:10:00.0 ens1f0np0: EEE is not active
kernel: bnxt_en 0000:10:00.0 ens1f0np0: FEC autoneg off encodings: None
NetworkManager[1417]: <info>  [1685482057.9581] device (ens1f0np0): carrier: link connected
smad[1712]: [INFO  ]: AgentX trap received
smad[1712]: [NOTICE]: AgentX trap CPQNIC (.1.3.6.1.6.3.1.1.4.1.0:.1.3.6.1.4.1.232.0.18011)
kernel: bnxt_en 0000:10:00.0 ens1f0np0: NIC Link is Down
smad[1712]: [INFO  ]: AgentX trap received
smad[1712]: [NOTICE]: AgentX trap CPQNIC (.1.3.6.1.6.3.1.1.4.1.0:.1.3.6.1.4.1.232.0.18012)
smad[1712]: [NOTICE]: IML received: 171 bytes
smad[1712]: [ALERT ]: CRITICAL: Broadcom P210tep NetXtreme-E Dual-port 10GBASE-T Ethernet PCIe Adapter Connectivity status changed to Link Failure for adapter in slot 1, port 1
smad[1712]: [INFO  ]: Log the IML info to syslog
kernel: bnxt_en 0000:10:00.0 ens1f0np0: NIC Link is Up, 100 Mbps full duplex, Flow control: ON - receive
kernel: bnxt_en 0000:10:00.0 ens1f0np0: EEE is not active
kernel: bnxt_en 0000:10:00.0 ens1f0np0: FEC autoneg off encodings: None
NetworkManager[1417]: <info>  [1685482061.4580] device (ens1f0np0): carrier: link connected
smad[1712]: [INFO  ]: AgentX trap received
smad[1712]: [NOTICE]: AgentX trap CPQNIC (.1.3.6.1.6.3.1.1.4.1.0:.1.3.6.1.4.1.232.0.18011)
kernel: bnxt_en 0000:10:00.0 ens1f0np0: NIC Link is Down
smad[1712]: [INFO  ]: AgentX trap received
smad[1712]: [NOTICE]: AgentX trap CPQNIC (.1.3.6.1.6.3.1.1.4.1.0:.1.3.6.1.4.1.232.0.18012)
smad[1712]: [NOTICE]: IML received: 171 bytes
smad[1712]: [ALERT ]: CRITICAL: Broadcom P210tep NetXtreme-E Dual-port 10GBASE-T Ethernet PCIe Adapter Connectivity status changed to Link Failure for adapter in slot 1, port 1
smad[1712]: [INFO  ]: Log the IML info to syslog
kernel: bnxt_en 0000:10:00.0 ens1f0np0: NIC Link is Up, 1000 Mbps full duplex, Flow control: ON - receive       ```

what's the host machine , OS ...

any [recent] software updates/installs/config-changes that might be causing/contributing ?

  • check the card is properly seated
  • check the cable
  • try another cable

i change cable -

"The device is placed in a server rack cabinet, and I don't think there has been any movement. We have been using it for a long time, but it started showing this error in the last 2-3 months."

Maybe a bug in the Broadcom driver, install an update!

It toggles between 100 Mbps full duplex and 1000 Mbps full duplex. What is the desired speed? Maybe it can be enforced? Or autoneg can be enabled?

1 Like

I'd also recommend coming at this from the other side as well, and seeing what the logs on the switch show. Were any errors detected on the switch port, or was anything else unusual logged on the switch for that port at the time the server shows the interface flapping ?

1 Like

Hi @soolan,

a flapping NIC can have various causes, e.g.:

  • Cable
  • NIC chip or port or bus
  • NIC bus slot on board
  • Switch port
  • (Driver)

Finding out the real cause is usually difficult, mostly it only works by trying:

  • Replace cable
  • Change switch port
  • Use & configure the other NIC port (NetXtreme-E Dual-port)
  • Use & configure other NIC
  • Change NIC slot on board
  • Replace NIC
  • (Update OS driver or driver from the manufacturer)

If the problem doesn't correspond to an update of the driver and/or the kernel, software is probably not the cause, but rather a hardware issue. If the problem however does correspond to a driver/kernel update, try to install the previous version(s).

1 Like

I have taken notes of what you wrote, and I will proceed according to this plan. I thought maybe we could identify the error directly from the logs, but it didn't work. Also, I haven't made any updates to the operating system or the driver of the card.

change the network card !!! sometime is broken

2 Likes

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.