Gurus needed to diagnose severe performance degradation

DBA_guy · September 1, 2009, 1:23pm

Hi everyone, newbie forum poster here. I'm an Oracle DBA and I require some guidance from the Unix gurus here about how to pinpoint where a problem is within a Solaris 9 system running on an 8 CPU Fujitsu server that acts as our Oracle database server. Our sysadmins are trying their best to resolve the issue but none of us are 100% sure where the issue resides - I'm hoping people here can help shed some light on things or help point us in a new/better direction.

Environment:
Server: Fujitsu P650, (7 cpu in use, 48GB RAM) Solaris 9 Generic_122300-22 sun4us sparc FJSV,GPUZC-M
Old Storage: EMC Clarion fibre attached storage
New Storage: NetApp storage, 3040 Controller, NFS mounted volumes via multi trunked1GB ethernet connection (not round robin)
Database: Oracle 9i

Problem: We are migrating our storage from fibre EMC to NFS NetApp and are encountering huge performance degradation... pin pointing where the problem is has been problematic. (As a DBA I seriously questioned this move, but this point is now moot as the money has been spent and we have to deal with it.)

Detail: We've been slowly migrating our databases off of the fibre EMC to NFS NetApp Some of our high performance databases struggled mightily on the NetApp storage and there has been lots of finger pointing as to why.

Symptoms: Over time (hours to days) database jobs and response times nosedive - lots of hooting and hollering from the business System response time can be extremely slow.. simple commands �df -h� �ls� slow in responding. However system load is typically minimal, almost non-existant. (low 1's and 2's for load) however at times we can see high kernel processing times.

Advice from our sysadmins: Current advice from one admin is that the Fujitsu server is older hardware that is not built for this kind of transaction processing. They have been monitoring �counters on the PCI bus (66MHz) and are seeing overflow issues� (forgive me if this isn't well articulated) and noticing that it �has problems keeping up�. Another sys admin feels that the PCI bus has nothing to do with it and that it is networking related: specifically that while we have trunking in place to the NetApp filer, it is not round robin and as a result the pipe from the server to the storage is too small for any given transaction (which from Oracle will necessarily be single threaded) Having conflicting reports from the sys admins is not great.

Are there any recommendations on where the problem possibly lies? (obviously this is very difficult to do from a few paragraphs). Or perhaps more realistically, aside from looking at top/prstat to see low load, iostat to see ok I/O processing times, sysadmins checking counters on a PCI bus, is there any other tools, either available in Solaris or 3rd party that can be used to definitively say �AHA! That is definitely where the bottleneck is!�

Many thanks in advance..

vbe · September 1, 2009, 2:01pm

There was a time you could buy HP Glance for solaris (if still possible try to get a free evaluation limited time copy). there is also sysload that is quite good...
NFS can be tricky to configure and optimize properly...
You havent said much about your network switches.. Ive seen switches go nuts and drive big cluster down (with NFS, oracle apps etc...)

Neo · September 1, 2009, 2:26pm

What is the network protocol used for the EMC fiber channel and what were the specs of that channel?

My first thoughts are that the EMC fiber channel is much higher performing than NFS over Ethernet.

That is the first place I would check.

Also, I hope you have not completely shut down the original system. That is unwise, because you have no baseline to check against any more.

Scrutinizer · September 1, 2009, 5:57pm

I would tend to think your troubles are IO related as you state your system is slow while processing is low. The nose diving points to a severe bottleneck situation, perhaps when Oracle has trouble writing to its archived redo logs or there is reporting activity during the day.

I know that if you want to use Oracle on Netapp / NFS there are some very specific instructions/settings for NFS on both server (the Netapp) and client (your Fujitsu Server) that you have to follow really meticulously.

Also, I think it may be advisable to create a separate storage LAN of VLAN, so you are not interfering with other traffic, or maybe have separate segments and spread the storage over several NFS mountpoints to different segments. If possible I would implement Jumbo ethernet frames, to optimize sequential IO a bit (full table scans, index scans).

Your server appears to have room for many slow (dedicated) PCI slots, so I would tend to spread the load over the PCI-slots to reach the required bandwidth.

Just some thoughts.. good luck.

S.

methyl · September 1, 2009, 7:24pm

You post that this is Gigabit Ethernet. The fibre connection from your old solution would be better.

First check that all ethernet cards on the server are set to NOT autonegotiate and that none of the ports on the hub/switch are set to autonegotiate and that no port involving your storage device is set to autonegotiate. If you have to change anything, schedule a cold start afterwards.

achenle · September 1, 2009, 7:38pm

A 66MHz, 64-bit PCI bus can handle 4 gbps, so unless it's a badly-engineered bus, that shouldn't be your problem.

Also, do NOT remove the autonegotiate feature from your network cards and switches if you're running gigE over copper.

If your links are not running at 1000 mbps, full-duplex, there's a problem. Papering over that underlying problem by forcing the link to 1000 mpbs full-duplex doesn't fix the problem.
Disabling autonegotiation on copper gigE places you outside the specifications of IEEE 802.3: (Hardware | Oracle)

I'd look to be sure you do all the tuning that Oracle advises for running over NFS. Especially make sure you're using jumbo frames.

And you might very well need to look into replacing your old hardware and software. There have been a lot of hardware and software advances in networking performance since Solaris 9 was current.

methyl · September 1, 2009, 8:00pm

achenlie is correct. If you have dropped to half-duplex it will be chronically slow. Just re-patching a cable can be enough to cause this issue when autonegotiation is in force. If you have the problem it may well be fixed by a total cold start where you bring up the network first, then the storage, then the servers.
Imho whether Jumbo Packets will cause or cure a performance issue depends on the network hardware.

To get an idea of scale, can your DBA post the contents of the following Oracle 9i system table along with how long the Oracle server had been up at the time:

v$sysstat

This table includes i/o stats and other useful pointers such as counts of performance killers like disc sorts.

fpmurphy · September 1, 2009, 8:11pm

What version of NFS are you using?

Neo · September 2, 2009, 1:52pm

A 1GB Ethernet connection can never approach 1GB/S because of the collision algorithm used by the Ethernet MAC protocol specification.

Sorry, I don't mean to sound like I am finger pointing, but you should have measured network performance, including throughput and latency, on both channels (old and new) before cutting over.

Ethernet does not perform well under heavy loads because of the way Ethernet works (aloha, collision, backoff) and when you add another protocol on top, the performance is worse.

A directly attached fiber channel should be far superior to ethernet, in this case. The only way to get past "finger pointing" is to build a baseline of the system before production. You have to know the maximum throughput and latency of the fiber channel and the same for the ethernet channel.

Then, you move into the next phase of testing (for a commercial applications). Without baselining, the team is always asking for trouble because you cannot know the system constraints and bottlenecks.

Normally, the network communications channel is the bottleneck. Then, the next problem is the I/O at the network interface level. These tend to perform worse than directly attached disk IO, etc.

I once worked in NYC on a TCP/IP throughput problem where people were about to get fired over the problems with production. There were finger pointing between all (network, system, and dB admins). Finally, I forced them to let me run TCP spray with the system shut down (or it was a parallel system, I can't recall), and then everyone said "Ah!! It it the network!)

Start at the network layer and work up, just the the TCP/IP protocol stack (or OSI stack, if you prefer). Without baselining, you are simply shooting in the dark and guessing. The fastest path to a solution is to take time and baseline the various critical systems, in this case, the network would be the best place to start.

Cheers.

Corona688 · September 2, 2009, 1:59pm

How close can it get? I regularly get 90MB with 100baseT, but haven't seen higher than 300MB on our gigabit lines.

Neo · September 2, 2009, 2:14pm

Well, it has been a long time since I had to do these calculations. The limits are published by IEEE (assuming point-to-point in this discussion) - I would have to Google for the numbers.

---------- Post updated at 18:14 ---------- Previous update was at 18:03 ----------

Also, I forgot to mention that the theoretical maximum for point-to-point Ethernet (assuming no other network devices talking on the channel), is different, of course, than the practical maximum based on things like "length of cable run", "crimps in the cable" , "connector losses", etc.

I once was on a site where the entire performance of the network management system was terrible and the problem was a crimped cable (I think someone rolled their chair across it in the data center, LOL)

That is why I advise to focus on the network channel(s) when you are debugging performance issues on distributed applications.

achenle · September 2, 2009, 4:13pm

neo:

Well, it has been a long time since I had to do these calculations. The limits are published by IEEE (assuming point-to-point in this discussion) - I would have to Google for the numbers.

---------- Post updated at 18:14 ---------- Previous update was at 18:03 ----------

Also, I forgot to mention that the theoretical maximum for point-to-point Ethernet (assuming no other network devices talking on the channel), is different, of course, than the practical maximum based on things like "length of cable run", "crimps in the cable" , "connector losses", etc.

I once was on a site where the entire performance of the network management system was terrible and the problem was a crimped cable (I think someone rolled their chair across it in the data center, LOL)

That is why I advise to focus on the network channel(s) when you are debugging performance issues on distributed applications.

If you really want to get into the weeds, the theoretical max bandwidth utilization of an unswitched Ethernet network is 32% of the nominal rate. The key word is "unswitched".

Once you throw switching into the equation, it's a lot easier to get higher rates. I can sustain 90+ megabytes/sec on a gigE point-to-point link, as long as it's dedicated traffic. Now, it takes newer hardware to do that as even a not-too-old IBM x305 that I've used as a WAN emulator (WANEM : The Wide Area Network Emulator) starts falling behind at about 30 or 40 megabytes/sec. And that's a bit newer albeit smaller box than the Fujitsu PrimePower that's the subject of this discussion.

Also, what are the NFS settings? The NFS version? What are the TCP send and receive hiwat settings? Jumbo frames?

Is direct IO enabled?

What is the exact version of Solaris 9? I'd suspect it needs to be as recent as possible.

Also, was the IO utilization of the older fiber channel configuration ever measured? That'd be nice to know in order to solve this problem.

Knowing the IO utilization now would be good, too. If it's moving 3.8 gbps NOW over NFS, it'd be hard to go much faster than that.

s93366 · September 4, 2009, 4:05am

I have worked a lot with NetApp filers and Oracle databases on Solaris.

With NFS and a 1gbit etherenet connection you should be able to reach around 50-90MB / sec in sequential reads/writes.

Check how your nfs/filer is working.. Create a large file on the nfs share with mkfile or dd command and check your throughput with iostat -xnpr while the file is beeing created.

also check your network card with netstat -i , do you have errors?

DirectIO is a good option but its not game breaking also change the rsize, wsize for nfs (64k) but thats just fine tuning.

but as the above poster stated, it would be great if you could post some numbers on how much IO the EMC did over FC

DBA_guy · September 8, 2009, 1:34pm

Thanks for the feedback everyone, there is lots to digest and compile, I will update again.