Power6 vs. Power7 hardware performance

mrmurdock · May 17, 2012, 1:09pm

I know I will get blasted for posting here, not know where to post this, however, I have performance question between the 2 platforms.

Not knowing the exact processors in each box, but the Power6 Platform clock speed is 4400 (?? at least what was told to me), and the Power7 clock speed is 3500. My application is performing almost at a 2x HIGHER on the Power6 4400 Clocked system over the Power7 3500 clocked system.
Would that type of clock speed really contribute to that type of performance Gain?
Of course the Versions of DB2 are different and our application version is different also.

bakunin · May 17, 2012, 5:44pm

Clock speed for a processor is like the number of revolutions in a motor. If you compare a Ferrari to, say, the ship diesel of the largest container ship in the world, which one do you think will revolve faster? And which one will have more power?

The only time when clock speed will enter the picture is if you compare two absolutely identical processors. In this case (but only in this case) the processing power of the processors (not the system!) will almost linearly reflect the proportion of their clock speeds.

Further, "performance" is a synonym for "fitness for a defined purpose", not "being fast". If you compare a Ferrari, an SUV and a 40-tons-truck, which one is "performing best"? That depends on the purpose, the roads to be used, etc. If the road is only a gully in the forest the speed of the truck and the Ferrari is probably reduced to zero, while the SUV still can go at 20mph. If you have to transport 100 tons of cargo the truck will be probably the fastest, because it has to go only 3 times while the SUV has to go 200 times and the Ferrari probably 2000 times. If the road is an 8-lane-highway and there is no cargo to transport then probably the Ferrari is the fastest, etc.

As you see, as long as you don't define your purpose you can't compare any system. You could - instead of clock speed - as well compare weight, number of screws used to mount it in the rack or similar numbers. They are all equally meaningless.

At last, even if you have a defined purpose to base your comparison on, computer systems are highly complex, interdependent systems. To expect the change of one aspect of this system (the clock speed) to have a linear effect is naive at best. Suppose you are the new trainer of a World-Class soccer team. On your first day you see that the team trains every day for 3 hours and last year they have scored 40 goals over the season. If you double their training time to 6 hours a day, would you expect them to score 80 goals in the next season? Probably not.

I hope this helps.

bakunin

gito · May 18, 2012, 1:25am

A lot of people realized that if you are calculating performance per core Power7 are slower then Power6. A specially you can see it in single thread applications.
IBM P7 - instead of putting more performance per core offers cores and more threads per cycle.

I Hope you have made some analysis before where is the butte-neck (IO, CPU, Memory)..

First what you need to do to check new microcode and OS updates, there are few apars
related to IO and Memory performance updates.

Second thing is SAN performance a lot of projects that I am working with is not only moving
from p6 to p7 but also they are changing to more VIO environment.
From there come problems with proper configuration of Vio servers and system tuning (like qdeph).

Third you need to take a view on your application is it possible to reconfigure it for more threads,
if not maybe you should decrease SMT4 (4 threads) to SMT2(2therads) or even to 1 thread per core

bakunin · May 18, 2012, 5:56am

Sorry to say this, but you should get your wording right: you are confusing speed with performance. This is the reason why even synthetical measurements of CPU performance come in several different numbers instead of some "grand total": there is SPECint, SPECfp, etc., etc. Even then this is not the whole picture when you try to determine how fast the work you want to be done is in fact done: there is L1-, L2- and L3-cache with certain I/O-bandwidth and cache hit-/miss-ratios, there is memory interface bandwidth, there are (a certain number of) pipelines, speculative execution, out-of-order execution, etc., etc.. All these are affecting how fast a program becomes executed, depending on how well a certain program makes use of these various things. And this is only the processor - not to mention the various other devices which affect the working of a system.

To say "processor A is slower than processor B" is like saying "green is better than yellow" - without a frame of reference detailing in which regard it means nothing. It might be that green is better suited for your purpose than yellow, but without stating this purpose in detail you haven't said anything at all.

To come back to the thread-O/Ps problem: without detailed information about the two systems and some way of making them comparable there is no way to say anything meaningful. You said that the two systems have different OS versions, different application versions and (so i do suppose) they differ in some other respects too. It might be that the different processors are the reason, it might as well be something else or a mixture of many factors. There is simply not enough data to base any assumption on.

I hope this helps.

bakunin

gito · May 18, 2012, 6:45am

I do not know how many servers you have in your environment but I was recently migrating over 100lpars form Power6 520, 570, 595 to Power7 770 frames. I know what I am talking about.

zxmaus · May 18, 2012, 9:35pm

hmmm

having more than 300 lpars - about 130 migrated from p6 to p7 I cannot see where p7 is even remotely slower than p6? - So I go with Bakunin's opinion. There are plenty of other reasons than cpu clockspeed that could cause your performance degregation.
In my experience, oracle performs much better on p7 (benefiting from the again out-of-order processing) and for sybase it's about the same if you stick with one virtual per engine.
Anyways to be able to help you I would suggest that you simply post some data from your system under load - like

vmstat -Iwt 2 10, vmstat -s, vmstat -v, iostat -Dl 2 10

and alikes ?
It would help as well if you could tell us anything about the amount of virtualization, if AIXTHREAD_SCOPE is set to S, which OS version and TL / ML you are running and similar things.
If your app really performs a lot better on p6 than on p7 in a different version, than I would probably see if there are issues with the code of your application connecting to the DB. When we were upgrading from sybase 12.5.4 to 15, we had lots of performance degregation because the entire behaviour of the DB was changing and our developers did not bother to amend the stored procedures to the new DB version.
What we are seeing as well when migrating to p7 is a lot more locking in our DBs, which makes the DB appear to be slower though it isn't. It just seems to be because the queries have not yet been cleaned up.
Last but not least - how did the data come onto your new box. If it has been replicated via tools like rman or repserver or goldengate - or even if it simply has been sftp'ed in multiple streams into the new filesystems - than you might just badly suffer under fragmentation within the filesystems - what would be very easy to diagnose via fileplace commands.

Regards
zxmaus