10gbe ixgbe slow at 420mbyte/s max, p2p lan cat8 1m, tried ethtool and ifconfig options to no avail

Hey guys, first time dealing with 10gbe.
I have two boxes, one is older with i7 2.8ghz from 2009 generation, pcie 2.0 and I just put the Intel X550T there to make use of absolutely similar but builtin adapter on my newer box with AMD EPYC gen 1.
I'm dling file from tmpfs on either box and never go above 415mbyte/s which is about half of potential bandwidth there, right?
I tried most stuff from kernel . org/doc/ols/2009/ols2009-pages-169-184.pdf with a little improvement from 393mbyte/s to current 415.
that's the result of me setting mtu 9000, txqueuelen 10000.
Also tried this to no improvements over current result: darksideclouds wordpress com/2016/10/10/tuning-10gb-nics-highway-to-hell/
And a few other pages out there on google, most stating same options.
I don't see a 100% load burst on i7 box when I dl file, it's around 10% across the cores. So CPU shouldn't be a bottleneck, cable(it's 1meter cat8, with a nice thick shielding) shouldn't be a bottleneck and ram speed and pcie 2.0 speed shouldn't be.
iperf also reports same speed.
Tried these tests in both directions.
There's gotta be something obvious that I'm missing, right?

Pretty sure it doesn't matter but it's lfs on i7 and debian 10 on epyc, both sharing same sysctl settings and ethtool-controlled options.
lspci -n for it on both computers:

62:00.0 0200: 8086:1563 (rev 01)

so it's similar device. I bought this particular one for compatibility but didn't expect these issues.

Maybe someone here would suggest something, I'm out of ideas.
TIA
--
solved see my own response

Hi tinfoil,

a) check the configured port speed on both sides

ethtool <yourinterfacename> | grep -iE "(speed|duplex)"

It should look like this:

ethtool enp216s0f1 | grep -Ei "(speed|duplex)"
        Speed: 10000Mb/s
        Duplex: Full

b) check the raw network speed with iperf :

Server

iperf -s -p 12345

Client

iperf -c server.ip.add.ress -p 12345

and after that in reverse direction with client/server exchanged.

iperf should be available in any package repository under linux. if not: download & compile

regards,
stomp

Hey stomp,
both sides show valid full 10gig, full duplex, I checked that first thing. As stated, txq and mtu match. Raising them to 10000 and 9000 respectively increased speed from 393 to 415 but not any further.
Swapped ports, tried flight-proven cat7 cable: no change.

iperf reports same speed as wget from nginx, apache, even busybox.
The interesting note is when dling to epyc (newer) box it shows 421mbyte/s, downloading to i7 from epyc it shows 415mbyte/s max. I don't know if that may suggest you anything.

also, updating post with more details:
ethtool info for interfaces differs slightly but i'm pretty sure it doesn't matter: you can just see the diff.

this is for i7: https : // paste. ubuntu. com/p/JmDQcYWkbs/
this is epyc: https : / / paste ubuntu .com/p/ZThRF6kcqs/
and this is diff(embedded):

# diff i7 epyc 
1,2c1
< Cannot get device udp-fragmentation-offload settings: Operation not supported
< Features for eth101:
---
> Features for eth1:
8c7
<     tx-checksum-fcoe-crc: off
---
>     tx-checksum-fcoe-crc: on
32c31
< tx-fcoe-segmentation: off
---
> tx-fcoe-segmentation: on

and ethtool -g for both boxes match:

Ring parameters for eth1:
Pre-set maximums:
RX:        4096
RX Mini:    0
RX Jumbo:    0
TX:        4096
Current hardware settings:
RX:        4096
RX Mini:    0
RX Jumbo:    0
TX:        4096

Please provide output of

inxi -v7

from both boxes to have some basic system information for your setup. Inxi has to be installed first. anonymize output before posting if necessary.

what info are you interested in, i can post it from conventional tools like ethtool, ip, ifconfig, lshw, lspci.

Hi,

I'm not sure if I can be of help. I just want to see the basic hardware and software environment. inxi does a good job collecting that and presenting it in a compact way. There may be some specific memories of problems, when I see your environment. Seems not very useful to mention everything of what most is probably not relevent in your case.

Inxi is a script relying on the basic system tools and perl. If you're sceptical to install or even only download an run it it's ok for me.

Further more I have the suggestion to review the kernel startup log(debian: /var/log/kern.log) and dmesg for any error/warning regarding driver module loading(especially complaints about missing firmware files) or networking errors/warnings.

2 Likes

no firmware related stuff but on i7 box i see this

[    3.784158] ixgbe: Intel(R) 10 Gigabit PCI Express Network Driver - version 5.1.0-k
[    3.784159] ixgbe: Copyright (c) 1999-2016 Intel Corporation.
[    4.460945] ixgbe 0000:02:00.0: Multiqueue Enabled: Rx Queue count = 4, Tx Queue count = 4 XDP Queue count = 0
[    4.559030] ixgbe 0000:02:00.0: 4.000 Gb/s available PCIe bandwidth, limited by 5 GT/s x1 link at 0000:00:05.0 (capable of 31.504 Gb/s with 8 GT/s x4 link)
[    4.670873] ixgbe 0000:02:00.0: MAC: 4, PHY: 0, PBA No: -
[    4.830343] ixgbe 0000:02:00.0: Intel(R) 10 Gigabit Network Connection
[    4.830387] libphy: ixgbe-mdio: probed
[    5.512853] ixgbe 0000:02:00.1: Multiqueue Enabled: Rx Queue count = 4, Tx Queue count = 4 XDP Queue count = 0
[    5.611006] ixgbe 0000:02:00.1: 4.000 Gb/s available PCIe bandwidth, limited by 5 GT/s x1 link at 0000:00:05.0 (capable of 31.504 Gb/s with 8 GT/s x4 link)
[    5.722787] ixgbe 0000:02:00.1: MAC: 4, PHY: 0, PBA No: -
[    5.882245] ixgbe 0000:02:00.1: Intel(R) 10 Gigabit Network Connection
[    5.882292] libphy: ixgbe-mdio: probed

does it say it sits on x1 not x4?
because on EPYC box it says this:

[    2.364368] ixgbe 0000:62:00.0: 32.000 Gb/s available PCIe bandwidth (5 GT/s x8 link)

kernel driver version matches, all other messages match too... strange, i thought this slot is x8, as manual states but then again i have 6 sata devices, full x16 gpu and both PCI slots used so that's probably all the lanes CPU has... you gave me an idea, thanks.
i guess i'll just pull it out later and try any of my other boxes and see if they report it differently under livecd deb 10.
i don't know how CPUs handle the lane selection but i guess if i have x16 card then x16 it would be unless it needs to be balanced out. this particular cpu is i7 860, with 16 lanes max, so if GPU is downgraded to x8, the rest goes to sata devices and two pci(legacy) devices(1gigabit lan and sound card).
there's no way it would be given x4 bandwidth by the cpu right. thanks, i will try it but i guess i'll have to suck it up and bear with it on 4gbit which would be a rather good result huh.

i guess there's really no way i can solve this and there's no better CPU for this socket so this would just never work on full speed unless i can get rid of some drives which isn't really possible. thanks intel for making such low-lane cpus. take amd, this box has 128 lanes with lot of them used by a bunch of nvmes and other stuff and i can plug in as much as physically possible and it would still have room for more in this case.

Thank's for figuring out the problem and giving feedback. Never had this myself. Good to know!

Very interesting indeed...

lspci -vv 2>/dev/null | gawk '/^[0-9a-f]+/ {dev=$0} match($0,/LnkSta.*idth x([0-9]+)/,r) {if(r[1]>0) {printf "%-6s %s\n","x"r[1],dev}} '

#lanes #pci-id   #device-name

x8     00:01.0   PCI bridge: Intel Corporation Xeon E7 v2/Xeon E5 v2/Core i7 PCI Express Root Port 1a (rev 04) (prog-if 00 [Normal decode])
x8     00:03.2   PCI bridge: Intel Corporation Xeon E7 v2/Xeon E5 v2/Core i7 PCI Express Root Port 3c (rev 04) (prog-if 00 [Normal decode])
x1     00:11.0   PCI bridge: Intel Corporation C600/X79 series chipset PCI Express Virtual Root Port (rev 06) (prog-if 00 [Normal decode])
x8     01:00.0   RAID bus controller: LSI Logic / Symbios Logic MegaRAID SAS 2208 [Thunderbolt] (rev 05)
x8     05:00.0   Ethernet controller: Intel Corporation Ethernet Controller 10-Gigabit X540-AT2 (rev 01)
x8     05:00.1   Ethernet controller: Intel Corporation Ethernet Controller 10-Gigabit X540-AT2 (rev 01)
x1     07:00.0   Serial Attached SCSI controller: Intel Corporation C602 chipset 4-Port SATA Storage Control Unit (rev 06)

i think i actually have an idea now, it may still be solvable if i use the x16->x1 riser the miners use for GPUs. so it's not that bad, i may be able to claim x4 for my network card, and video speed may still be sufficient as long as i don't play games or videos which i usually don't there. that isn't ideal but it may work. sleeping with a problem can solve it.

maybe there are some settings about pcie lanes configurable in your intel mainboards bios?

2 Likes

yeah, that too, thought about it too. i'll see on next reboot but unlikely. it's not a common thing to have it on desktop boards