Extremely low throughput between AIX 7.2 and RHEL Maipo

I am currently having problems with AIX and network performance.

I have an AIX 7.2 on-premises and a Linux with RH 7.7 Maipo in Google Cloud. Between the 2 machines I have a guaranteed bandwith of 1 Gbit/s.

Firstly I started with a trivial traceroute: what I noticed is that after some hops appears the following error "fragmentation required, trying new MTU=1492". Basically this appears 4 times: MTU changes from the starting 1500 to 1492, then to 1480, then to 1472 and finally to 1006. This brings also a very very low throughput (around 16 Mbit/s). Do you have any idea of what's going on? If I try the same experiment from a Linux machine I don't get this issue and the throughtput is quite good (around 320 Mbit/sec).

Could you help me please? Thank you

Have you considered to set following mentioned in below document if you are using VPN between ?

MTU considerations | Cloud VPN | Google Cloud

Can you show output of :

ip l sh

On that google VM.

Regards
Peasant.

1 Like

Hi, thank you for your kind reply

This is my output, as required:

1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1460 qdisc mq state UP mode DEFAULT group default qlen 1000
    link/ether 42:01:0a:99:c8:ad brd ff:ff:ff:ff:ff:ff

Actually I don't have a VPN but a Partner Interconnect: I don't know if this may change something.
What I made in these days was working on the AIX-side: basically I focused on TCP performance/congestion window, modifying some parameters in the "no -a" (let me know if you need the output). After that, I made some scp transfers between AIX and Google GCP and I got a higher throughput (around 240 Mbit/s). Unfortunately in some of my attemps the throughput decreases very quickly, especially when I try sending larger files (around 1 or 2GB), and so I end up with my usual and previous 16 Mbit/s. In other attemps, instead, my speed connection keeps increasing (max 400 Mbit/s) or is stable.

Concerning the traceroute, I tried a traceroute toward the same Google GCP machine using many Linux on-premises but I didn't get any "fragmentation required" issue. On the other side, I always get the "fragmentation required" issue if I traceroute from an AIX.. I'm focusing on this issue because I feel like this problem is correlated to the low throughput one.

Thank you all for your time :slight_smile:

Well unfortunately, i'm no AIX guy, but it if those other boxes in your LAN are connected to same network gear and everything works as expected / paid...
I would put my 2 cents on AIX.

Hopefully, someone will more in depth AIX network stack knowledge will jump in with some metrics / dump commands.

How is the performance in your local LAN when you copy something to AIX box from other linux box ?

Regards
Peasant.

I wonder why the shown MTU is 1460 while the standard is 1500.
But if your LAN switch/router works better with 1460 then try to set it on the other box, too.

I remember a similar issue (severe packet loss), where all Linux systems had the standard MTU 1500. The LAN guy changed the MTU on the LAN switch (or router?), and that fixed it.

Google Cloud mandates this MTU:

REF: MTU considerations | Cloud VPN | Google Cloud

1 Like

Hi all,
when I copy from AIX to Linux in my local LAN I get an acceptable throughput (around 400 Mbit/s).
Same thing from AIX to AIX.

I think that my problem from LAN to Cloud might be related to some misconfiguration on the network, probably in some switch or router, but how could you explain the fact that if I transfer from a Linux in the same LAN I don't get any issue while transfering?

I analyzed some tcpdump (AIX-Cloud communication) with Wireshark and I found a lot of "Duplicate Ack", "ACKed segment that wasn't captured (common at capture start)" and "Previous segment(s) not captured (common at capture start)".

Any idea or thought would be useful for me,
Thank you again :slight_smile:

Did you read the above post?

There is no point having MTU of 1500 if Google uses max 1460, only to get yourself into trouble with lost packets and of course AIX trying to find a suitable value which it hardly ever find and so generates more errors, giving a higher velue than what can be accepted WILL generate duplicates...
And after a little google I found this:
https://hide.me/en/knowledgebase/how-to-find-correct-mtu-values/

So I might say if you had read ALL the posts above, you had in hands all the answers to all your questions...