AIX server problem - network connection is unstable !

Our company has just bought 2 IBM servers P8 - to make a Database Cluster. We've finished the installation progresses, but now there's a problem: the network connection is unstable - like this

sysopr1@ic_tsm:/home/sysopr1>ping 10.0.91.18
PING 10.0.91.18: (10.0.91.18): 56 data bytes
64 bytes from 10.0.91.18: icmp_seq=0 ttl=255 time=0 ms
64 bytes from 10.0.91.18: icmp_seq=1 ttl=255 time=0 ms
64 bytes from 10.0.91.18: icmp_seq=2 ttl=255 time=0 ms
64 bytes from 10.0.91.18: icmp_seq=3 ttl=255 time=42 ms
64 bytes from 10.0.91.18: icmp_seq=4 ttl=255 time=0 ms
64 bytes from 10.0.91.18: icmp_seq=5 ttl=255 time=0 ms
64 bytes from 10.0.91.18: icmp_seq=6 ttl=255 time=0 ms
64 bytes from 10.0.91.18: icmp_seq=7 ttl=255 time=0 ms
64 bytes from 10.0.91.18: icmp_seq=8 ttl=255 time=0 ms
64 bytes from 10.0.91.18: icmp_seq=9 ttl=255 time=0 ms
64 bytes from 10.0.91.18: icmp_seq=10 ttl=255 time=48 ms
64 bytes from 10.0.91.18: icmp_seq=11 ttl=255 time=0 ms
64 bytes from 10.0.91.18: icmp_seq=12 ttl=255 time=0 ms
64 bytes from 10.0.91.18: icmp_seq=13 ttl=255 time=0 ms
64 bytes from 10.0.91.18: icmp_seq=14 ttl=255 time=0 ms
64 bytes from 10.0.91.18: icmp_seq=15 ttl=255 time=0 ms
64 bytes from 10.0.91.18: icmp_seq=16 ttl=255 time=0 ms
64 bytes from 10.0.91.18: icmp_seq=17 ttl=255 time=0 ms
64 bytes from 10.0.91.18: icmp_seq=18 ttl=255 time=0 ms
64 bytes from 10.0.91.18: icmp_seq=19 ttl=255 time=0 ms
64 bytes from 10.0.91.18: icmp_seq=20 ttl=255 time=0 ms
64 bytes from 10.0.91.18: icmp_seq=21 ttl=255 time=0 ms
64 bytes from 10.0.91.18: icmp_seq=22 ttl=255 time=0 ms
64 bytes from 10.0.91.18: icmp_seq=23 ttl=255 time=0 ms
64 bytes from 10.0.91.18: icmp_seq=24 ttl=255 time=0 ms
64 bytes from 10.0.91.18: icmp_seq=25 ttl=255 time=0 ms
64 bytes from 10.0.91.18: icmp_seq=26 ttl=255 time=3 ms
64 bytes from 10.0.91.18: icmp_seq=27 ttl=255 time=0 ms
64 bytes from 10.0.91.18: icmp_seq=28 ttl=255 time=44 ms

We ping these two new servers from many servers with diffirences vlan, but it's always like that. This's really a big problem because the Oracle RAC requies the "time"" parametter must be <3.

Could someone help please ! Thanks

Hi,
We know nothing about your architecture, how can we reply?

Ethernet, in the "old days", used to use a protocol called Carrier-sense multiple access with collision detection (CSMA/CD):

Either way, if you have many devices "talking" on the same LAN segment, it is normal for a packet to be delayed.

As vbe has mentioned (above), without the full details of your network topography and hosts; there is no way to answer your question accurately.

If you still need help, please post all the details so the experts here do not have to guess what you are doing.

Thanks.

sysopr1@oltpn8c:/home/sysopr1>oslevel -s
6100-09-04-1441
sysopr1@oltpn8c:/home/sysopr1>prtconf|pg
prtconf: open: : No such device or address
System Model: IBM,9080-MME
Machine Serial Number: 7814C98
Processor Type: PowerPC_POWER8
Processor Implementation Mode: POWER 7
Processor Version: PV_7_Compat
Number Of Processors: 20
Processor Clock Speed: 4024 MHz
CPU Type: 64-bit
Kernel Type: 64-bit
LPAR Info: 1 78-14C98
Memory Size: 233216 MB
Good Memory Size: 233216 MB
Platform Firmware level: SC860_165
Firmware Version: IBM,FW860.51 (SC860_165)
Console Login: enable
Auto Restart: true
Full Core: false
NX Crypto Acceleration: No Data Available
 
Network Information
        Host Name: oltpn8c
        IP Address: 10.0.91.18
        Sub Netmask: 255.255.255.0
        Gateway: 10.0.91.1
        Name Server: 10.0.58.11
:[1] + Stopped (SIGTSTP)        prtconf|pg
sysopr1@oltpn8c:/home/sysopr1>ifconfig -a
en32: flags=1e080863,18c0<UP,BROADCAST,NOTRAILERS,RUNNING,SIMPLEX,MULTICAST,GROUPRT,64BIT,CHECKSUM_OFFLOAD(ACTIVE),LARGESEND,CHAIN>
        inet 10.0.91.18 netmask 0xffffff00 broadcast 10.0.91.255
         tcp_sendspace 131072 tcp_recvspace 65536 rfc1323 0
en33: flags=1e080863,18c0<UP,BROADCAST,NOTRAILERS,RUNNING,SIMPLEX,MULTICAST,GROUPRT,64BIT,CHECKSUM_OFFLOAD(ACTIVE),LARGESEND,CHAIN>
        inet 10.0.10.18 netmask 0xffffff00 broadcast 10.0.10.255
         tcp_sendspace 131072 tcp_recvspace 65536 rfc1323 0
en34: flags=1e080863,18c0<UP,BROADCAST,NOTRAILERS,RUNNING,SIMPLEX,MULTICAST,GROUPRT,64BIT,CHECKSUM_OFFLOAD(ACTIVE),LARGESEND,CHAIN>
        inet 10.0.20.18 netmask 0xffffff00 broadcast 10.0.20.255
         tcp_sendspace 131072 tcp_recvspace 65536 rfc1323 0
en35: flags=1e080863,18c0<UP,BROADCAST,NOTRAILERS,RUNNING,SIMPLEX,MULTICAST,GROUPRT,64BIT,CHECKSUM_OFFLOAD(ACTIVE),LARGESEND,CHAIN>
        inet 10.2.12.18 netmask 0xffffff00 broadcast 10.2.12.255
         tcp_sendspace 131072 tcp_recvspace 65536 rfc1323 0
en36: flags=1e080863,18c0<UP,BROADCAST,NOTRAILERS,RUNNING,SIMPLEX,MULTICAST,GROUPRT,64BIT,CHECKSUM_OFFLOAD(ACTIVE),LARGESEND,CHAIN>
        inet 10.0.21.18 netmask 0xffffff00 broadcast 10.0.21.255
         tcp_sendspace 262144 tcp_recvspace 262144 rfc1323 1
en37: flags=1e080863,18c0<UP,BROADCAST,NOTRAILERS,RUNNING,SIMPLEX,MULTICAST,GROUPRT,64BIT,CHECKSUM_OFFLOAD(ACTIVE),LARGESEND,CHAIN>
        inet 10.0.22.18 netmask 0xffffff00 broadcast 10.0.22.255
         tcp_sendspace 262144 tcp_recvspace 262144 rfc1323 1
lo0: flags=e08084b,c0<UP,BROADCAST,LOOPBACK,RUNNING,SIMPLEX,MULTICAST,GROUPRT,64BIT,LARGESEND,CHAIN>
        inet 127.0.0.1 netmask 0xff000000 broadcast 127.255.255.255
        inet6 ::1%1/0
         tcp_sendspace 131072 tcp_recvspace 131072 rfc1323 1
sysopr1@oltpn8c:/home/sysopr1>netstat -nr
Routing tables
Destination        Gateway           Flags   Refs     Use  If   Exp  Groups

Route Tree for Protocol Family 2 (Internet):
default            10.0.91.1         UG        4   1221220 en32     -      -   
10.0.10.0          10.0.10.18        UHSb      0         0 en33     -      -   =>
10.0.10/24         10.0.10.18        U         3     67387 en33     -      -   
10.0.10.18         127.0.0.1         UGHS      0        17 lo0      -      -   
10.0.10.255        10.0.10.18        UHSb      0        56 en33     -      -   
10.0.20.0          10.0.20.18        UHSb      0         0 en34     -      -   =>
10.0.20/24         10.0.20.18        U         3  17607231 en34     -      -   
10.0.20.18         127.0.0.1         UGHS      0       558 lo0      -      -   
10.0.20.255        10.0.20.18        UHSb      0         1 en34     -      -   
10.0.21.0          10.0.21.18        UHSb      0         0 en36     -      -   =>
10.0.21/24         10.0.21.18        U         2     11333 en36     -      -   
10.0.21.18         127.0.0.1         UGHS      0     34069 lo0      -      -   
10.0.21.255        10.0.21.18        UHSb      2       312 en36     -      -   
10.0.22.0          10.0.22.18        UHSb      0         0 en37     -      -   =>
10.0.22/24         10.0.22.18        U         1    338564 en37     -      -   
10.0.22.18         127.0.0.1         UGHS      1       869 lo0      -      -   
10.0.22.255        10.0.22.18        UHSb      0        10 en37     -      -   
10.0.91.0          10.0.91.18        UHSb      0         0 en32     -      -   =>
10.0.91/24         10.0.91.18        U        16    726011 en32     -      -   
10.0.91.18         127.0.0.1         UGHS      0     97786 lo0      -      -   
10.0.91.255        10.0.91.18        UHSb      0         1 en32     -      -   
10.2.12.0          10.2.12.18        UHSb      0         0 en35     -      -   =>
10.2.12/24         10.2.12.18        U         0    157411 en35     -      -   
10.2.12.18         127.0.0.1         UGHS      0        93 lo0      -      -   
10.2.12.255        10.2.12.18        UHSb      0         1 en35     -      -   
127/8              127.0.0.1         U         9    684363 lo0      -      -   
192.168.213/24     10.0.20.1         UGS       1    103247 en34     -      -   
192.168.214/24     10.0.21.1         UGS       1     43510 en36     -      -   

Route Tree for Protocol Family 24 (Internet v6):
::1%1              ::1%1             UH        1    280956 lo0      -      -

We installed something like GPFS, Power Path, Oracle Server - but I think it's not the point. Whatever the network is - 10.0.91/24, 10.0.10/24, 10.0.22/24 ..., it's always unstable.
I know it would be hard to help with such these infomation, so let me know if you want to know more.

You have only provided information about a single host. That is really not going to help us solving a network issue.

OK....

Start here, and post back the results of this on your host (above)

arp -a

Also, do you have login access to this IP:

10.0.91.1 
sysopr1@oltpn8c:/home/sysopr1>arp -a              
  client_gateway.agribank.com.vn (10.0.91.1) at 0:1c:7f:64:16:f [ethernet] stored in bucket 1

  ? (10.0.91.2) at 0:1c:7f:64:16:f [ethernet] stored in bucket 2

  ? (10.0.91.3) at 0:1c:7f:68:41:c3 [ethernet] stored in bucket 3

  ? (10.0.20.1) at 0:23:33:e1:0:da [ethernet] stored in bucket 3

  ? (10.0.10.29) at 0:14:5e:b8:8a:6e [ethernet] stored in bucket 4

  ? (10.0.22.236) at 0:11:25:7b:17:e [ethernet] stored in bucket 5

  ? (10.0.10.38) at 0:1a:64:91:5:ff [ethernet] stored in bucket 13

  oltpn5h (10.0.91.162) at e4:1f:13:50:35:85 [ethernet] stored in bucket 13

  ? (10.0.91.164) at 0:1a:64:a7:37:68 [ethernet] stored in bucket 15

  ? (10.0.10.42) at 0:14:5e:b8:8a:79 [ethernet] stored in bucket 17

  oltpn8c-vip (10.0.91.19) at 98:be:94:0:67:30 [ethernet] stored in bucket 19

  o10rac (10.0.21.66) at 6c:ae:8b:45:c4:90 [ethernet] stored in bucket 26

  ihapp8 (10.0.91.180) at 0:1a:64:1e:ea:d8 [ethernet] stored in bucket 31

  ihapp2 (10.0.91.182) at 0:21:5e:8a:d2:f8 [ethernet] stored in bucket 33

  ihapp3 (10.0.91.183) at 0:21:5e:8a:d5:18 [ethernet] stored in bucket 34

  ihapp4 (10.0.91.184) at 0:21:5e:8a:d3:6e [ethernet] stored in bucket 35

  ihapp5 (10.0.91.185) at 0:14:5e:b8:88:dc [ethernet] stored in bucket 36

  ? (10.0.10.61) at 0:14:5e:b8:63:f8 [ethernet] stored in bucket 36

  ih_ei (10.0.91.187) at 0:1a:64:1e:d0:f1 [ethernet] stored in bucket 38

  ? (10.2.12.199) at 98:be:94:74:8e:9a [ethernet] stored in bucket 42

  ? (10.0.91.50) at 0:1a:64:ad:50:a3 [ethernet] stored in bucket 50

  oltpn4c (10.0.91.62) at e4:1f:13:50:35:a7 [ethernet] stored in bucket 62

  oltpn10c (10.0.91.66) at 98:be:94:0:67:30 [ethernet] stored in bucket 66

  osb1.x.x.x (10.0.10.242) at 0:10:e0:3c:b8:62 [ethernet] stored in bucket 68

  o10gpfs (10.0.20.66) at 98:be:94:0:67:32 [ethernet] stored in bucket 68

  ? (10.0.10.250) at 98:be:94:23:20:da [ethernet] stored in bucket 76

  o9rac (10.0.21.119) at 34:80:d:66:41:8c [ethernet] stored in bucket 79

  icapp2 (10.0.91.82) at 0:21:5e:8a:9c:a [ethernet] stored in bucket 82

  icapp4 (10.0.91.84) at 0:1a:64:91:38:3f [ethernet] stored in bucket 84

  icapp5 (10.0.91.85) at 0:14:5e:b8:89:53 [ethernet] stored in bucket 85

  ic_ei (10.0.91.87) at 0:1a:64:1e:d0:21 [ethernet] stored in bucket 87

  ? (10.0.10.113) at 0:21:5e:8a:d0:20 [ethernet] stored in bucket 88

  ? (10.0.10.115) at 0:21:5e:8a:d2:8c [ethernet] stored in bucket 90

  misclient01.x.x.x (10.0.91.91) at 0:10:e0:3b:6b:4d [ethernet] stored in bucket 91

  ? (10.0.10.117) at 0:21:5e:8a:d0:16 [ethernet] stored in bucket 92

  ? (10.0.10.119) at 0:14:5e:b8:88:d1 [ethernet] stored in bucket 94

  ? (10.0.22.38) at 0:11:25:7b:26:7e [ethernet] stored in bucket 105

  ? (10.0.21.1) at 0:23:33:e1:0:da [ethernet] stored in bucket 110

  ? (10.0.10.135) at 0:1a:64:91:3c:c4 [ethernet] stored in bucket 110

  o5mgt (10.0.10.137) at e4:1f:13:50:36:56 [ethernet] stored in bucket 112

  oltpn9h (10.0.91.119) at 0:90:fa:d8:de:b [ethernet] stored in bucket 119

  o9gpfs (10.0.20.119) at 0:90:fa:d8:de:12 [ethernet] stored in bucket 121

  ? (10.0.21.38) at 0:11:25:7b:22:a3 [ethernet] stored in bucket 147

bucket:    0     contains:    0 entries
bucket:    1     contains:    1 entries
bucket:    2     contains:    1 entries
bucket:    3     contains:    2 entries
bucket:    4     contains:    1 entries
bucket:    5     contains:    1 entries
bucket:    6     contains:    0 entries
bucket:    7     contains:    0 entries
bucket:    8     contains:    0 entries
bucket:    9     contains:    0 entries
bucket:   10     contains:    0 entries
bucket:   11     contains:    0 entries
bucket:   12     contains:    0 entries
bucket:   13     contains:    2 entries
bucket:   14     contains:    0 entries
bucket:   15     contains:    1 entries
bucket:   16     contains:    0 entries
bucket:   17     contains:    1 entries
bucket:   18     contains:    0 entries
bucket:   19     contains:    1 entries
bucket:   20     contains:    0 entries
bucket:   21     contains:    0 entries
bucket:   22     contains:    0 entries
bucket:   23     contains:    0 entries
bucket:   24     contains:    0 entries
bucket:   25     contains:    0 entries
bucket:   26     contains:    1 entries
bucket:   27     contains:    0 entries
bucket:   28     contains:    0 entries
bucket:   29     contains:    0 entries
bucket:   30     contains:    0 entries
bucket:   31     contains:    1 entries
bucket:   32     contains:    0 entries
bucket:   33     contains:    1 entries
bucket:   34     contains:    1 entries
bucket:   35     contains:    1 entries
bucket:   36     contains:    2 entries
bucket:   37     contains:    0 entries
bucket:   38     contains:    1 entries
bucket:   39     contains:    0 entries
bucket:   40     contains:    0 entries
bucket:   41     contains:    0 entries
bucket:   42     contains:    1 entries
bucket:   43     contains:    0 entries
bucket:   44     contains:    0 entries
bucket:   45     contains:    0 entries
bucket:   46     contains:    0 entries
bucket:   47     contains:    0 entries
bucket:   48     contains:    0 entries
bucket:   49     contains:    0 entries
bucket:   50     contains:    1 entries
bucket:   51     contains:    0 entries
bucket:   52     contains:    0 entries
bucket:   53     contains:    0 entries
bucket:   54     contains:    0 entries
bucket:   55     contains:    0 entries
bucket:   56     contains:    0 entries
bucket:   57     contains:    0 entries
bucket:   58     contains:    0 entries
bucket:   59     contains:    0 entries
bucket:   60     contains:    0 entries
bucket:   61     contains:    0 entries
bucket:   62     contains:    1 entries
bucket:   63     contains:    0 entries
bucket:   64     contains:    0 entries
bucket:   65     contains:    0 entries
bucket:   66     contains:    1 entries
bucket:   67     contains:    0 entries
bucket:   68     contains:    2 entries
bucket:   69     contains:    0 entries
bucket:   70     contains:    0 entries
bucket:   71     contains:    0 entries
bucket:   72     contains:    0 entries
bucket:   73     contains:    0 entries
bucket:   74     contains:    0 entries
bucket:   75     contains:    0 entries
bucket:   76     contains:    1 entries
bucket:   77     contains:    0 entries
bucket:   78     contains:    0 entries
bucket:   79     contains:    1 entries
bucket:   80     contains:    0 entries
bucket:   81     contains:    0 entries
bucket:   82     contains:    1 entries
bucket:   83     contains:    0 entries
bucket:   84     contains:    1 entries
bucket:   85     contains:    1 entries
bucket:   86     contains:    0 entries
bucket:   87     contains:    1 entries
bucket:   88     contains:    1 entries
bucket:   89     contains:    0 entries
bucket:   90     contains:    1 entries
bucket:   91     contains:    1 entries
bucket:   92     contains:    1 entries
bucket:   93     contains:    0 entries
bucket:   94     contains:    1 entries
bucket:   95     contains:    0 entries
bucket:   96     contains:    0 entries
bucket:   97     contains:    0 entries
bucket:   98     contains:    0 entries
bucket:   99     contains:    0 entries
bucket:  100     contains:    0 entries
bucket:  101     contains:    0 entries
bucket:  102     contains:    0 entries
bucket:  103     contains:    0 entries
bucket:  104     contains:    0 entries
bucket:  105     contains:    1 entries
bucket:  106     contains:    0 entries
bucket:  107     contains:    0 entries
bucket:  108     contains:    0 entries
bucket:  109     contains:    0 entries
bucket:  110     contains:    2 entries
bucket:  111     contains:    0 entries
bucket:  112     contains:    1 entries
bucket:  113     contains:    0 entries
bucket:  114     contains:    0 entries
bucket:  115     contains:    0 entries
bucket:  116     contains:    0 entries
bucket:  117     contains:    0 entries
bucket:  118     contains:    0 entries
bucket:  119     contains:    1 entries
bucket:  120     contains:    0 entries
bucket:  121     contains:    1 entries
bucket:  122     contains:    0 entries
bucket:  123     contains:    0 entries
bucket:  124     contains:    0 entries
bucket:  125     contains:    0 entries
bucket:  126     contains:    0 entries
bucket:  127     contains:    0 entries
bucket:  128     contains:    0 entries
bucket:  129     contains:    0 entries
bucket:  130     contains:    0 entries
bucket:  131     contains:    0 entries
bucket:  132     contains:    0 entries
bucket:  133     contains:    0 entries
bucket:  134     contains:    0 entries
bucket:  135     contains:    0 entries
bucket:  136     contains:    0 entries
bucket:  137     contains:    0 entries
bucket:  138     contains:    0 entries
bucket:  139     contains:    0 entries
bucket:  140     contains:    0 entries
bucket:  141     contains:    0 entries
bucket:  142     contains:    0 entries
bucket:  143     contains:    0 entries
bucket:  144     contains:    0 entries
bucket:  145     contains:    0 entries
bucket:  146     contains:    0 entries
bucket:  147     contains:    1 entries
bucket:  148     contains:    0 entries

There are 43 entries in the arp table.
  • I don''t get you - 10.0.91.1 is default gateway

More infomation : GPFS 3.3, Oracle Database 11g Enterprise Edition Release 11.2.0.4.7 - 64bit Production ; GPFS: 6 node cluster ( we have 4 node initially, 2 new node have just been added recently ), and we found something unnormal - there's an ent with PROMISC mode in one node - but we don't know for sure

en36: flags=1e084963,18c0<UP,BROADCAST,NOTRAILERS,RUNNING,PROMISC,SIMPLEX,MULTICAST,GROUPRT,64BIT,CHECKSUM_OFFLOAD(ACTIVE),LARGESEND,CHAIN>
        inet 10.0.21.119 netmask 0xffffff00 broadcast 10.0.21.255
        inet 169.254.186.79 netmask 0xffff0000 broadcast 169.254.255.255
         tcp_sendspace 262144 tcp_recvspace 262144 rfc1323 1

Yes, I know - - 10.0.91.1 is default gateway

I asked you if you can log into it? Was this not clear, when I asked you?

Regarding your arp it looks fairly clear to me that you have a lot of devices on this ethernet segment.

When you have many devices on the same segment, sometimes packets will be delayed. That is now networking works.

If you want it to be "very fast and always the higher priority" you should put only the two devices on that network segment.

This seems really basic. What am I missing? Tell me.

Might want to look into getting your partition running in Power8 mode instead of Power7 compatibility mode. Also should think about getting off of AIX 6.1.

Have you check to see if there are any hardware problems or errors on the network interface?

On this ethernet segment we have many servers, but only these two servers got error, so I don't think this's the point.

This could be a point - we installed AIX 6.1 on an IBM p8 server but we're not sure. Do you have any suggesions? The network and hardware devices is OK

A slight delay in a packet on a busy Ethernet segment is not "an error". That is how Ethernet works.

What you are "showing us' (an occasional 44 ms delay) is typical of all busy Ethernet segments with many devices.

44 ms is 44 one thousandths of a second. 44/1000 seconds.

The way in which you answer my questions shows you do not understand LAN networking and Ethernet (or networking in general).

Delays on an Ethernet segment are not "errors" and nor is it a sign of "instability". This his how the Ethernet (and indeed most networking protocols work) protocols works (queues, delays, priorities).

If you want guaranteed fast delivery between two hosts, you need to move them to their own Ethernet segment with only two devices on that network.

This is the only approach any experienced network engineer would take or advise.

Think about how Ethernet works. Every device wants to talk at the same time on the same wire. They cannot talk at the same time. The more devices, the more this happens. Each device will wait a random number of milliseconds to transmit when the network is busy. This is now LANs work. The more devices, the more a chance of a delay. So..... a 44ms delay (44/1000 of a second) occasionally on a busy LAN segment is normal.

As I mentioned to you, but you do not seem to want to understand, you want guaranteed fast delivery between two hosts on a LAN segment, you need to move them to their own Ethernet segment with only two devices on that physical networking segment.

Reference:

Network delay - Wikipedia

Your occasional delay of 44ms is small and normal for Ethernets. In addition, each h/w device (LAN card for example) can have a different characteristic. The more devices on the LAN segment, the more of a chance for crosstalk, etc.

If you need better performance, all network engineers, not only me, will advise you to put the two devices on their own (dedicated) LAN segment. This is the only way to insure the LAN segment delay is minimal.

If you don't want to believe a network systems engineer with over 30 years IP and Internet-based networking experience (33+ to be more exact), maybe you will believe the myriad references on the Internet:

Networking 101: Primer on Latency and Bandwidth - High Performance
Browser Networking (O'Reilly)

(just one of hundreds / thousands of examples on the net which discuss this topic.)

In addition, ping does not measure latency nor does it measure round-trip time. It measures ICMP echo request response time. ICMP messages run with low priority and take longer than other traffic. This means that if any single host on your network is "talking" at the same time (cross talk), your ping packet will be delayed.

As I mentioned, 44 ms is not much delay. It is 44/1000 of second.

If you need faster times between two devices on the same LAN segment, the #1 solution is to move the devices to their own LAN segment.

Also, this reference might be useful:

REF:

https://www.lantronix.com/resources/networking-tutorials/network-switching-tutorial/

You can easily see in the diagram that the way the LAN is configured is the key to client/server response time on a LAN segment.

Hence, when you showed us your arp configuration, it was easy to see you have many clients / hosts on a single LAN segment.

You're right. I hardly know about the networking, but some guys in network team told me that there's nothing wrong with the network devices, network configuration or something like that. They told me to focus on the server and OS. I have to agree with them because there're some servers which almost have the same network configuration with the two new servers, but have no problem like them

sysopr1@oltpn4c:/home/sysopr1>arp -a
  ? (10.0.91.1) at 0:1c:7f:64:16:f [ethernet] stored in bucket 1

  ? (10.0.10.26) at 88:94:71:c4:4a:49 [ethernet] stored in bucket 1

  ? (10.0.91.2) at 0:1c:7f:64:16:f [ethernet] stored in bucket 2

  ? (10.0.91.3) at 0:1c:7f:68:41:c3 [ethernet] stored in bucket 3

  ? (10.0.10.29) at 0:14:5e:b8:8a:6e [ethernet] stored in bucket 4

  ? (10.0.22.236) at 0:11:25:7b:17:e [ethernet] stored in bucket 5

  oltpn5h (10.0.91.162) at e4:1f:13:50:35:85 [ethernet] stored in bucket 13

  o6_mgt (10.0.10.38) at 0:1a:64:91:5:ff [ethernet] stored in bucket 13

  misn2h (10.0.91.164) at 0:1a:64:a7:37:68 [ethernet] stored in bucket 15

  ? (10.0.10.42) at 0:14:5e:b8:8a:79 [ethernet] stored in bucket 17

  oltpn10c-priv (10.0.21.66) at 6c:ae:8b:45:c4:90 [ethernet] stored in bucket 26

  ? (10.0.10.201) at 0:10:e0:3b:6b:4c [ethernet] stored in bucket 27

  m2def (10.0.19.135) at 0:1a:64:91:42:6e [ethernet] stored in bucket 30

  ihapp8 (10.0.91.180) at 0:1a:64:1e:ea:d8 [ethernet] stored in bucket 31

  ? (10.0.10.205) at 0:10:e0:35:dc:12 [ethernet] stored in bucket 31

  ? (10.0.10.206) at 0:10:e0:39:ef:c6 [ethernet] stored in bucket 32

  ihapp1 (10.0.91.181) at 0:14:5e:b8:88:cc [ethernet] stored in bucket 32

  ihapp2 (10.0.91.182) at 0:21:5e:8a:d2:f8 [ethernet] stored in bucket 33

  ? (10.0.10.207) at 0:10:e0:3a:35:14 [ethernet] stored in bucket 33

  ihapp3 (10.0.91.183) at 0:21:5e:8a:d5:18 [ethernet] stored in bucket 34

  ? (10.0.10.208) at 0:10:e0:39:ef:e4 [ethernet] stored in bucket 34

  ihapp4 (10.0.91.184) at 0:21:5e:8a:d3:6e [ethernet] stored in bucket 35

  ? (10.0.10.209) at 0:10:e0:3a:2e:b4 [ethernet] stored in bucket 35

  ihapp5 (10.0.91.185) at 0:14:5e:b8:88:dc [ethernet] stored in bucket 36

  ? (10.0.10.61) at 0:14:5e:b8:63:f8 [ethernet] stored in bucket 36

  ? (10.0.10.210) at 0:10:e0:39:ef:90 [ethernet] stored in bucket 36

  ihapp6 (10.0.91.186) at 0:14:5e:b8:87:e0 [ethernet] stored in bucket 37

  ? (10.0.10.211) at 0:10:e0:3a:2c:14 [ethernet] stored in bucket 37

  ih_ei (10.0.91.187) at 0:1a:64:1e:d0:f1 [ethernet] stored in bucket 38

  ic_tsm (10.0.10.63) at 0:1a:64:ad:4f:2a [ethernet] stored in bucket 38

  ? (10.0.91.188) at 0:14:5e:b8:87:6f [ethernet] stored in bucket 39

  ? (10.0.91.189) at 0:1a:64:1e:d0:f0 [ethernet] stored in bucket 40

  ? (10.0.10.70) at 8:9e:1:59:ff:3 [ethernet] stored in bucket 45

  ic_smgrr (10.0.91.50) at 0:1a:64:ad:50:a3 [ethernet] stored in bucket 50

  ? (10.0.10.75) at 70:20:84:fd:7e:8b [ethernet] stored in bucket 50

  ic_da (10.0.91.51) at 0:1a:64:ad:50:87 [ethernet] stored in bucket 51

  ? (10.0.91.202) at 0:21:5e:8a:d0:ae [ethernet] stored in bucket 53

  ? (10.0.91.203) at 0:21:5e:8a:d1:4a [ethernet] stored in bucket 54

  ? (10.0.10.229) at 0:10:e0:31:9b:44 [ethernet] stored in bucket 55

  ? (10.0.10.230) at 0:10:e0:2e:2a:70 [ethernet] stored in bucket 56

  ? (10.0.10.231) at 0:10:e0:31:ee:2c [ethernet] stored in bucket 57

  ? (10.0.10.232) at 0:10:e0:31:48:a6 [ethernet] stored in bucket 58

  oltpn10c (10.0.91.66) at 98:be:94:0:67:30 [ethernet] stored in bucket 66

  ? (10.0.10.92) at f0:de:f1:22:a4:13 [ethernet] stored in bucket 67

  o10gpfs (10.0.20.66) at 98:be:94:0:67:32 [ethernet] stored in bucket 68

  ? (10.0.10.253) at 0:1a:64:78:4a:9e [ethernet] stored in bucket 79

  ? (10.0.10.254) at 0:1a:64:95:5c:c2 [ethernet] stored in bucket 80

  icapp1 (10.0.91.81) at 0:1a:64:1e:a3:63 [ethernet] stored in bucket 81

  icapp2 (10.0.91.82) at 0:21:5e:8a:9c:a [ethernet] stored in bucket 82

  icapp3 (10.0.91.83) at 0:1a:64:91:40:f5 [ethernet] stored in bucket 83

  icapp4 (10.0.91.84) at 0:1a:64:91:38:3f [ethernet] stored in bucket 84

  icapp5 (10.0.91.85) at 0:14:5e:b8:89:53 [ethernet] stored in bucket 85

  icapp6 (10.0.91.86) at 0:14:5e:b8:8a:70 [ethernet] stored in bucket 86

  ic_ei (10.0.91.87) at 0:1a:64:1e:d0:21 [ethernet] stored in bucket 87

  ? (10.0.10.113) at 0:21:5e:8a:d0:20 [ethernet] stored in bucket 88

  ? (10.0.10.115) at 0:21:5e:8a:d2:8c [ethernet] stored in bucket 90

  ? (10.0.91.91) at 0:10:e0:3b:6b:4d [ethernet] stored in bucket 91

  ? (10.0.10.117) at 0:21:5e:8a:d0:16 [ethernet] stored in bucket 92

  ? (10.0.91.92) at 0:10:e0:3b:64:27 [ethernet] stored in bucket 92

  ? (10.0.91.93) at 0:10:e0:39:f6:3f [ethernet] stored in bucket 93

  ? (10.0.10.119) at 0:14:5e:b8:88:d1 [ethernet] stored in bucket 94

  ? (10.0.91.94) at 0:10:e0:3b:6f:85 [ethernet] stored in bucket 94

  oltpn5h-priv (10.0.21.137) at 0:14:5e:99:13:61 [ethernet] stored in bucket 97

  o6dg (10.0.22.38) at 0:11:25:7b:26:7e [ethernet] stored in bucket 105

  ? (10.0.10.135) at 0:1a:64:91:3c:c4 [ethernet] stored in bucket 110

  o5_mgt (10.0.10.137) at e4:1f:13:50:36:56 [ethernet] stored in bucket 112

  ? (10.0.10.2) at 0:1c:7f:64:16:f [ethernet] stored in bucket 126

  ? (10.0.10.170) at 8:9e:1:61:e9:5b [ethernet] stored in bucket 145

  oltpn6c-priv (10.0.21.38) at 0:11:25:7b:22:a3 [ethernet] stored in bucket 147

bucket:    0     contains:    0 entries
bucket:    1     contains:    2 entries
bucket:    2     contains:    1 entries
bucket:    3     contains:    1 entries
bucket:    4     contains:    1 entries
bucket:    5     contains:    1 entries
bucket:    6     contains:    0 entries
bucket:    7     contains:    0 entries
bucket:    8     contains:    0 entries
bucket:    9     contains:    0 entries
bucket:   10     contains:    0 entries
bucket:   11     contains:    0 entries
bucket:   12     contains:    0 entries
bucket:   13     contains:    2 entries
bucket:   14     contains:    0 entries
bucket:   15     contains:    1 entries
bucket:   16     contains:    0 entries
bucket:   17     contains:    1 entries
bucket:   18     contains:    0 entries
bucket:   19     contains:    0 entries
bucket:   20     contains:    0 entries
bucket:   21     contains:    0 entries
bucket:   22     contains:    0 entries
bucket:   23     contains:    0 entries
bucket:   24     contains:    0 entries
bucket:   25     contains:    0 entries
bucket:   26     contains:    1 entries
bucket:   27     contains:    1 entries
bucket:   28     contains:    0 entries
bucket:   29     contains:    0 entries
bucket:   30     contains:    1 entries
bucket:   31     contains:    2 entries
bucket:   32     contains:    2 entries
bucket:   33     contains:    2 entries
bucket:   34     contains:    2 entries
bucket:   35     contains:    2 entries
bucket:   36     contains:    3 entries
bucket:   37     contains:    2 entries
bucket:   38     contains:    2 entries
bucket:   39     contains:    1 entries
bucket:   40     contains:    1 entries
bucket:   41     contains:    0 entries
bucket:   42     contains:    0 entries
bucket:   43     contains:    0 entries
bucket:   44     contains:    0 entries
bucket:   45     contains:    1 entries
bucket:   46     contains:    0 entries
bucket:   47     contains:    0 entries
bucket:   48     contains:    0 entries
bucket:   49     contains:    0 entries
bucket:   50     contains:    2 entries
bucket:   51     contains:    1 entries
bucket:   52     contains:    0 entries
bucket:   53     contains:    1 entries
bucket:   54     contains:    1 entries
bucket:   55     contains:    1 entries
bucket:   56     contains:    1 entries
bucket:   57     contains:    1 entries
bucket:   58     contains:    1 entries
bucket:   59     contains:    0 entries
bucket:   60     contains:    0 entries
bucket:   61     contains:    0 entries
bucket:   62     contains:    0 entries
bucket:   63     contains:    0 entries
bucket:   64     contains:    0 entries
bucket:   65     contains:    0 entries
bucket:   66     contains:    1 entries
bucket:   67     contains:    1 entries
bucket:   68     contains:    1 entries
bucket:   69     contains:    0 entries
bucket:   70     contains:    0 entries
bucket:   71     contains:    0 entries
bucket:   72     contains:    0 entries
bucket:   73     contains:    0 entries
bucket:   74     contains:    0 entries
bucket:   75     contains:    0 entries
bucket:   76     contains:    0 entries
bucket:   77     contains:    0 entries
bucket:   78     contains:    0 entries
bucket:   79     contains:    1 entries
bucket:   80     contains:    1 entries
bucket:   81     contains:    1 entries
bucket:   82     contains:    1 entries
bucket:   83     contains:    1 entries
bucket:   84     contains:    1 entries
bucket:   85     contains:    1 entries
bucket:   86     contains:    1 entries
bucket:   87     contains:    1 entries
bucket:   88     contains:    1 entries
bucket:   89     contains:    0 entries
bucket:   90     contains:    1 entries
bucket:   91     contains:    1 entries
bucket:   92     contains:    2 entries
bucket:   93     contains:    1 entries
bucket:   94     contains:    2 entries
bucket:   95     contains:    0 entries
bucket:   96     contains:    0 entries
bucket:   97     contains:    1 entries
bucket:   98     contains:    0 entries
bucket:   99     contains:    0 entries
bucket:  100     contains:    0 entries
bucket:  101     contains:    0 entries
bucket:  102     contains:    0 entries
bucket:  103     contains:    0 entries
bucket:  104     contains:    0 entries
bucket:  105     contains:    1 entries
bucket:  106     contains:    0 entries
bucket:  107     contains:    0 entries
bucket:  108     contains:    0 entries
bucket:  109     contains:    0 entries
bucket:  110     contains:    1 entries
bucket:  111     contains:    0 entries
bucket:  112     contains:    1 entries
bucket:  113     contains:    0 entries
bucket:  114     contains:    0 entries
bucket:  115     contains:    0 entries
bucket:  116     contains:    0 entries
bucket:  117     contains:    0 entries
bucket:  118     contains:    0 entries
bucket:  119     contains:    0 entries
bucket:  120     contains:    0 entries
bucket:  121     contains:    0 entries
bucket:  122     contains:    0 entries
bucket:  123     contains:    0 entries
bucket:  124     contains:    0 entries
bucket:  125     contains:    0 entries
bucket:  126     contains:    1 entries
bucket:  127     contains:    0 entries
bucket:  128     contains:    0 entries
bucket:  129     contains:    0 entries
bucket:  130     contains:    0 entries
bucket:  131     contains:    0 entries
bucket:  132     contains:    0 entries
bucket:  133     contains:    0 entries
bucket:  134     contains:    0 entries
bucket:  135     contains:    0 entries
bucket:  136     contains:    0 entries
bucket:  137     contains:    0 entries
bucket:  138     contains:    0 entries
bucket:  139     contains:    0 entries
bucket:  140     contains:    0 entries
bucket:  141     contains:    0 entries
bucket:  142     contains:    0 entries
bucket:  143     contains:    0 entries
bucket:  144     contains:    0 entries
bucket:  145     contains:    1 entries
bucket:  146     contains:    0 entries
bucket:  147     contains:    1 entries
bucket:  148     contains:    0 entries

There are 69 entries in the arp table.

PING oltpn4c: (10.0.91.62): 56 data bytes
64 bytes from 10.0.91.62: icmp_seq=0 ttl=255 time=0 ms
64 bytes from 10.0.91.62: icmp_seq=1 ttl=255 time=0 ms
64 bytes from 10.0.91.62: icmp_seq=2 ttl=255 time=0 ms
64 bytes from 10.0.91.62: icmp_seq=3 ttl=255 time=0 ms
64 bytes from 10.0.91.62: icmp_seq=4 ttl=255 time=0 ms
64 bytes from 10.0.91.62: icmp_seq=5 ttl=255 time=0 ms
64 bytes from 10.0.91.62: icmp_seq=6 ttl=255 time=0 ms
64 bytes from 10.0.91.62: icmp_seq=7 ttl=255 time=0 ms
64 bytes from 10.0.91.62: icmp_seq=8 ttl=255 time=0 ms
64 bytes from 10.0.91.62: icmp_seq=9 ttl=255 time=0 ms
64 bytes from 10.0.91.62: icmp_seq=10 ttl=255 time=1 ms
64 bytes from 10.0.91.62: icmp_seq=11 ttl=255 time=0 ms
64 bytes from 10.0.91.62: icmp_seq=12 ttl=255 time=0 ms
64 bytes from 10.0.91.62: icmp_seq=13 ttl=255 time=0 ms
64 bytes from 10.0.91.62: icmp_seq=14 ttl=255 time=0 ms
64 bytes from 10.0.91.62: icmp_seq=15 ttl=255 time=0 ms
64 bytes from 10.0.91.62: icmp_seq=16 ttl=255 time=0 ms
64 bytes from 10.0.91.62: icmp_seq=17 ttl=255 time=0 ms
64 bytes from 10.0.91.62: icmp_seq=18 ttl=255 time=0 ms
64 bytes from 10.0.91.62: icmp_seq=19 ttl=255 time=0 ms
64 bytes from 10.0.91.62: icmp_seq=20 ttl=255 time=0 ms
64 bytes from 10.0.91.62: icmp_seq=21 ttl=255 time=0 ms
64 bytes from 10.0.91.62: icmp_seq=22 ttl=255 time=0 ms
64 bytes from 10.0.91.62: icmp_seq=23 ttl=255 time=0 ms
64 bytes from 10.0.91.62: icmp_seq=24 ttl=255 time=0 ms
64 bytes from 10.0.91.62: icmp_seq=25 ttl=255 time=0 ms
64 bytes from 10.0.91.62: icmp_seq=26 ttl=255 time=0 ms
64 bytes from 10.0.91.62: icmp_seq=27 ttl=255 time=0 ms
64 bytes from 10.0.91.62: icmp_seq=28 ttl=255 time=0 ms
64 bytes from 10.0.91.62: icmp_seq=29 ttl=255 time=0 ms
64 bytes from 10.0.91.62: icmp_seq=30 ttl=255 time=0 ms
64 bytes from 10.0.91.62: icmp_seq=31 ttl=255 time=0 ms
64 bytes from 10.0.91.62: icmp_seq=32 ttl=255 time=0 ms
64 bytes from 10.0.91.62: icmp_seq=33 ttl=255 time=0 ms
64 bytes from 10.0.91.62: icmp_seq=34 ttl=255 time=0 ms
64 bytes from 10.0.91.62: icmp_seq=35 ttl=255 time=0 ms
64 bytes from 10.0.91.62: icmp_seq=36 ttl=255 time=0 ms
64 bytes from 10.0.91.62: icmp_seq=37 ttl=255 time=0 ms
64 bytes from 10.0.91.62: icmp_seq=38 ttl=255 time=0 ms
64 bytes from 10.0.91.62: icmp_seq=39 ttl=255 time=0 ms
64 bytes from 10.0.91.62: icmp_seq=40 ttl=255 time=0 ms

And it's a misunderstanding that I don't belive you. It's just a complicated case, and I don't have enough knowledge to solve it .

Yes, I understand.

When on a LAN segment you should also realize that every hardware interface to the LAN can have a different characteristic.

Some ethernet cards are more "chatty" and some are less "chatty" and some are "old" and others are "new" and some have drivers / firmware written by "A" and others have drivers / firmware written by "B".

You say "you don't have a lot of knowledge"... that's normal.

So, I am telling you what an experienced network engineer with a lot of knowledge would so.

If I had a busy LAN segment with a lot of devices on the same segment (subnet), as you have indicated, and I had two devices which I wanted to have the best communication speed between them, I would put them on their own LAN segment (subnet) and "be done with it" and maybe "retest" when the two devices are the only two devices (or three if you have a different gateway device) on that segment.

1 Like

This is a pretty large box, a E870 as a matter of fact. Is the network coming from a VIO? If so: Is the VIO network setup correctly? Does the VIO have the correct resource assignments to provide services to however many lpars you have running?

I ask the above because I encountered an issue where the VIO was set up by an MSP and what was happening is every so often the active network path would switch from say c1-p1-t1 to c10-p1-t1. That would cause a momentary ping delay. Much like what you are seeing. My network team was the first to report it to me, as they would see the MAC for the etherchannel device change which port it was reporting on the switch. The item they sent me was %SW_MATM-4-MACFLAP_NOTIF. It was caused by the etherchannel on the VIO having two primary adapters and no backup adapter. Taking one of the primary adapters and moving it to the backup adapter fixed the issue. Your results may vary.

If they are dedicated adapters and not coming from the VIO, again check your AIX configuration. Something to note, your tcp_recvspace and rfc1323 are not consistent with your adapters. That might be by design, but gives me pause that network wasn't set up correctly.

Now, everything else that everyone has posted here comes into play as well, and I'm not a network admin so I cannot weigh in on the other topics presented here.

Push comes to shove, start a ticket with the IBM AIX team.

2 Likes

We're running with full resources, no VIO.

We've asked network team to recheck the network. We are also planning to upgrade OS to 7.1 to get help from IBM.

please show me the outputs of following commands (if ent is mentioned for every ent adapter configured, if ipaddress is stated for every IP of your cluster partners)

lsdev -Cc adapter
entstat entx
no -a
vmstat -IWwt 2 10
ping -c 10  ipaddress 25000

Are your cluster partner's IP addresses in /etc/hosts
how are /etc/netsvc.conf and /etc/resolv.conf configured (order of things)
how many disks are part of your gpfs cluster

1 Like

Ping to another server

Ping from 10.0.91.82

Ping to one node

  • Yes
  • There's nothing in /etc/netsvc.conf
  • 154 disk per site

:):slight_smile:

ok, I don't see any errors or overflows or underruns on any of your adapters.That is a good thing. You can transfer large package sizes to everywhere else without loosing packages - which is a good thing too.
It would have been nice to know which adapters make up which link aggregation, but I forgot to ask for it :slight_smile:
A few things I see in your network tunables, that I would probably change if this would be my systems to improve the general network flow, (i.e. sack, tcp_nodelayack, and buffer sizes) but for that it would help to know exactly which adapters make up which link aggregation. So if you could show me the lsattr -El entx outputs for your link aggregations and the underlying physical adapters that would help to make sure you dont use different speeds and depending on the adapters, the device attributes are set correctly. It might help as well to set certain settings on the adapters themselves in addition to the general system settings, like rfc1323 and buffers. And I can see that you have relatively little free memory. This may or may not be a problem so if you can give me the outputs of vmstat -v and vmstat -s , that would help me tell you the answer to this. You may as well want to populate netsvc.conf as it helps with order of name resolution and might generally improve network speed when the system does not have to guess how to find another host.
To get rid of the pkcs error on top of the lsdev -Cc adapter output, you probably need to install the security.pkcs11 fileset. And finally, can you please run lppchk -vm3 and post the output as well?.

1 Like

Thanks for your help. This's what you need. Hope you can find something