Intermittent Network Issue

I have some issues to look at to do with reported network problems. We have had reports of intermittent connection issue between 2 servers when trying to access an Oracle Dbase. And I have been asked to check the hardware to see if it's a server issue or not.

I have done some basic checks using lanadmin to check stats on lan interfaces and linkloop to test the connectivity to the lan on the other server and see no issues. The output from lanadmin -g <lan id> on both servers, on all lans have no errors indicated.

I have asked for specific time when this has happened incase can get anything from historical logs, but in the meantime if anyone has any ideas for checking these intermittent errors or are there other things we can use to monitor network traffic from the servers perspective to try and see if there are issues as the network boys (as normal) say its not there problem......

Any suggestions greatly welcomed on how to prove them wrong.

What kind of issue do you have? packet lost or only no db conectivity?

If the problem is connecting Oracle database, listener is always aswering? What about load average, disk usage etc... just this time?

could you take a look if interface is overloaded?

Tell your networking boys get some use stadistics of ports or even check the physical wire.

When this issue appears, all boxes of this subnet is working fine?

If I have more ideas I will let you know but this is very rare :wink:

To be honest at this stage I am not sure. I am waiting for more specific details of these connection failures. I have check with the DBA boys to make sure the listeners were up and working, and I have check with the network people but typically all you get from them is "It not our problem".....

It's a difficult one because I have ploughed through the system logs to see if I can see anything that stands out as a "failure" of any type and cannot see anything.

Getting things like disk usage at the time is difficult if there is no specific pattern. Maybe will have to do some testing with SAR to log data over a period of days and see what happens.

Other than that I feel like its going to be an argument with the networking boys.. oh what fun.

If you do think of any thing else, all ideas are greatly welcomed in how to find the issue of an intermittent network fault...

Thanks

You may see error counts with netstat . See "man netstat". Unless the counters are zeroised first, the difference over time has more meaning than the actual figures.

netstat -s -p tcp

There is much network statistical information in Oracle table v$systat (notably the SQL*Net statistics). and individual session statistical information from Oracle table v$session against v$sess_io .

Without knowing what physical network joins the two computers we can only guess.

It is imperative that all your network cards and network ports are set consistently and definitely NOT set to auto-negotiate. A server should be cold started if anybody displaces a LAN cable.