packet loss problem

I have 4 network ports on our T5240 sun server.

all but 1 gives packet losses (nxge1)

nxge0 gives on average 50% packet loss, very bad.

nxge2 gives on average 1-2% packet loss.

nxge3 gives on average 20% packet loss.

Is there a tool or something to help me find the problem?

yes there is, it's called brain 1.0 and it's a realy usefull tool ;).

the nxge interfaces are sensitive and often have problems in many environments. please be sure to use the latest solaris patchcluster to have the latest nxge drivers on your system. also check for autoneg problems (are you on a cisco switch?). have a look with "dladm show-dev" to know your settings and maybe try to switch to 1000fdx fixed (ndd get/set is your friend to try this out).

hth,
DN2

I don't have that version of brain, I only have the beta version.:smiley:

The beta version tells me there may be a configuration issue with
the data center connection or a down right misconfiguration.:confused:

The switch interface to data center shows a lot of collisions. My
switch interface was set to auto and it showed data center at
half duplex, but when I contacted them they said their side was
set to full duplex. He set it to auto and my side went to full
duplex automatically.:confused:

They also mentioned they saw errors and problems on their end.:eek:

When one partner is at full duplex and the other partner is at auto-negotiate, the partner at auto-negotiate will misread its partner and establish the wrong duplex. This should be guaranteed to fail in this manner.

Before they made the change to AUTO duplex, I made the change to FULL on
my side and it drooped the connection :frowning: and went to redundant connection,
which was also showing as HALF duplex on their side, my side set to AUTO for
the redundant interface.

So if they were indeed at FULL DUPLEX it would NOT have dropped the
connection and should have been FULL to FULL, but that is not the case.

When I changed back to AUTO it said HALF duplex again on their side. Only
when they changed to AUTO did my side show them as FULL duplex.

Why did my switch say HALF for data center when my switch was set to
AUTO duplex and data center side was set to FULL?:confused:

Why would connection drop if supposedly both sides where set to FULL?:confused:

This is all very frustrating, the data center is seeing problems on their end
but not disclosing that information to me and suggesting it is my problem.:mad:

Don't worry Photon, you are not the first by a long way.
Auto-negotiation of LAN speed and duplex settings does not work at all well
with servers. It can appear to work and then go very wrong after a power fail.

Perderabo advice is good.

It takes two to tango.

BOTH sides need to be hard set to NOT auto-negotiate.

1) Network
2) Servers

After the network guy have hard-set their LAN ports,
please positively configure your end to match.
Avoid auto-negotiation. Avoid half-duplex.

Then you need a one-time totally cold start.
i.e. Shutdown the server. Wait 5 mins. Cold start the server.

If this does not clear the fault then it's time to call the network engineer.

Hope this helps.

The connection would have only dropped if the speeds were mis-matched. Duplex should not have caused this.

Duplex is simple to understand. If one side is set to AUTO it expects the other side to be set to AUTO also.

Think of it as a 3-way hand shake for a connection. First side asks the Second what he can run at. The Second side says I can do Full, the First side says ok we will do Full. If both are set to AUTO then they decide at what Duplex they will connect at, which is normally FULL.

If one side is hard coded it ignores request for duplex settings thus the side set to AUTO defaults to Half.

It is not normal to set servers to AUTO. AUTO should only be used when you don't know what might be plugged into the port.

If they are really seeing issue and they believe it is you then they should be able to provide details as to why it is your side with the problem.

They should be able to provide you with the port settings from the running config on the switch. I would ask for this.

I finally got an error message from them.

protocol ARP
destination Broadcast

duplicate use of IP's detected

My configuratin looks like this

---> data line ---> switch1 ---> Router1 and Router2 on HSRP ----> switch2 ---> servers

I don't understand why their side would see my IP on both routers if it is
on HSRP standby?

Found the problem.

cable had to be a cross connect. Apparently connecting a switch to switch
will cause this, so I just connected directly to router.

Found out when both sides took off auto and put duplex full and speed 100.
It did not connect. So having auto really was the problem.

Also found some errors in duplicate ARP broadcasts in the process, that I fixed.

Thanks for leading me in right direction.:D:b:

I also found a great tool for snooping on network, besides brain 1.0.:wink:

so give us a link... maybe it helps other people also!

cheers,
DN2

Well, the network guys were using Wireshark, I have not had time
to look at it to much but the output they sent me was really nice and
it is a free tool.

What do you think. I would like to implement a monitor for the
network. Wireshark output looks like huge files. How do you usually monitor
the network to find errors?

on solaris with the "snoop" tool. but under all other os with wireshark...