We have 2 lpars on p6 blade. One of the lpar is having 3 core cpu with 5gb memory running sybase as database. An EOD process takes 25 min. to complete.
Now we have an lpar on P7 server with entitled cpu capacity of 2 with 16 Gb memory and sybase as database. The EOD process which takes 25 min on p6 is taking 50 min on P7 to complete. I have disabled the SMT and checked it reduces to 40 min but still it is higher than p6.
Can anybody help me?
are the databases running on the same storage? you might want to collect nmon data on both systems during the process, and compare them to determine the bottleneck
also check out recommended tunable settings for AIX from the sybase documentation
for more help you need to provide more information
What's different from the old p6 environment to the p7? Which modes are you running ie. dedicated or shared LPARs, if shared then capped or uncapped?
Different OS versions?
As funksen said, different disk/storage layout?
Absolutely the same EOD process? I am unsure - I guess End Of Day process? What type of process is that, what is involved on the old hardware, what on the new? Same environment? If SQL is involved, same query as before? Is the database indexed etc. as the one before to avoid table scans etc.?
Had the old environment be tuned already? vmo? ioo? AIO?...
---------- Post updated at 12:35 PM ---------- Previous update was at 12:31 PM ----------
Hi,
And the p7 is using xiv and old p6 is using ds8k with svc. The other difference is p7 is having aix 6100-06-06-1140 and p6 is having aix 5300-07-01-0748
What is the CPU speed of your Power6 and your Power7 frame?
I got problems when migrating form P595 power6 5GHz to P770 power7 3GHz.
Power7 id offering more cores but performance per core is lower then Power6
Multithread software got relay speedup, but working Single thread applications were suffering because of per core performance.
What is your microcode level on power7 ? recently there was an update that was solving memory performance issues.
your IO subsystem seems to be causing your issues. Did you setup your logical volumes (filesystems, raws) with max or minimum distribution? How big are your disks? What is the output of vmstat -v and vmstat -s. What is the queue depth on your disks and if this is vio storage, on the vio servers.
What do you mean by logical volumes (filesystems, raws) with max or minimum distribution? Disks are of 100Gb x 2. queue_depth is 40 and this is XIV storage.
#vmstat -v
4194304 memory pages
3986502 lruable pages
574924 free pages
1 memory pools
413979 pinned pages
95.0 maxpin percentage
3.0 minperm percentage
90.0 maxperm percentage
64.2 numperm percentage
2561112 file pages
0.0 compressed percentage
0 compressed pages
64.2 numclient percentage
90.0 maxclient percentage
2561112 client pages
0 remote pageouts scheduled
0 pending disk I/Os blocked with no pbuf
0 paging space I/Os blocked with no psbuf
2228 filesystem I/Os blocked with no fsbuf
8 client filesystem I/Os blocked with no fsbuf
38788 external pager filesystem I/Os blocked with no fsbuf
25.2 percentage of memory used for computational pages
#vmstat -s
2911826 total address trans. faults
857620 page ins
6318673 page outs
0 paging space page ins
0 paging space page outs
0 total reclaims
1720594 zero filled pages faults
35814 executable filled pages faults
0 pages examined by clock
0 revolutions of the clock hand
0 pages freed by the clock
248183 backtracks
0 free frame waits
0 extend XPT waits
110954 pending I/O waits
7175994 start I/Os
2749907 iodones
22266453 cpu context switches
2734773 device interrupts
289681 software interrupts
2108993 decrementer interrupts
371 mpc-sent interrupts
371 mpc-receive interrupts
35496 phantom interrupts
0 traps
92113814 syscalls
I mean the interpolicy which you usually define during creation of your logical volumes / rawdevices. If its set to minimum - and you have only a few huge disks - you speak to your data in a serial fashion - which is usually a quite bad idea for databases with the only exception of sybase IQ and oracle asm which handle the data distribution internally.
You still did not answer if you run sybase in filesystems or rawdevices. Generally - and specifically tempdb as this is the most used part of your sybase DB. From the amount of numperm you are using, you are utilizing most of your memory for filecaching so I would guess it's filesystems - or you are having otherwise plenty of non-raw IO which is buffered though it probably doesnt have to be. You might want to consider moving your tempdb's into RAM disk after having remediated the reason for hogging so much non-comp memory.
You might want to consider as well to set j2_dynamicBufferPreallocation=128 or 256. And ... what is your network size setting in your sybase version / which sybase version do you actually run. And is it the same between p6 and p7 - or did you maybe upgrade from 12 to 15 - in which case any stored procedures you might have can cause your issues.
Few more questions:
Disks are assigned directly (FC adapters) or through VIO?
Those are 2 disks in mirror?
What is the pp and block size on you VG/filesystems.
But I agree with zxmous it does not looks like memory or cpu issue then only thing what left is storage.
Maybe you just got unlucky and got disks from pool shared with other heavy used systems.
Have you been trying to compare db IO stats a specially storage IO response time ?
The interpolicy is minimum. The sybase db and tempdb are on filesystems.
The sybase version is 12.5.2 on both live and new environment.
On P7 disk are from XIV directly attached to server and not from VIO. On old P6 the disk are from DS8k via SVC.
To zaxxon
The P7 lpars has dedicated processors same as P6, the only difference is P6 is 4Ghz and P7 s 3.5Ghz. The EOD process is absolutely same. As per the dba the database is indexed. The old environment is tuned and is on 5.3. As per the IBM documentation aix 6.1 is already tuned.
Yesterday I have disabled the multi threading and the time taken is reduced to 37 min. But its much more than live which is 25 min.
As zxmaus has suggested need to check by creating raw disk for temp db. I will check and revert.
---------- Post updated at 12:40 PM ---------- Previous update was at 12:14 PM ----------
The below are the sybase settings.
[Named Cache:abwslive_data_cache]
cache size = 750M
cache status = mixed cache
cache replacement policy = DEFAULT
local cache partition number = DEFAULT
[16K I/O Buffer Pool]
pool size = 100.0000M
wash size = DEFAULT
local async prefetch limit = DEFAULT
[4K I/O Buffer Pool]
pool size = 360.0000M
wash size = DEFAULT
local async prefetch limit = DEFAULT
[Named Cache:default data cache]
cache size = 800M
cache status = default data cache
cache replacement policy = DEFAULT
local cache partition number = DEFAULT
[16K I/O Buffer Pool]
pool size = 100.0000M
wash size = DEFAULT
local async prefetch limit = DEFAULT
[Meta-Data Caches]
number of open databases = DEFAULT
number of open objects = 4000
open object spinlock ratio = DEFAULT
number of open indexes = 700
open index hash spinlock ratio = DEFAULT
open index spinlock ratio = DEFAULT
partition groups = DEFAULT
partition spinlock ratio = DEFAULT
[Disk I/O]
disk i/o structures = 600
number of large i/o buffers = DEFAULT
page utilization percent = DEFAULT
number of devices = 30
disable disk mirroring = DEFAULT
allow sql server async i/o = DEFAULT
[SQL Server Administration]
procedure cache size = 107520
runnable process search count = 100
number of aux scan descriptors = 1000
[User Environment]
number of user connections = 500
stack size = DEFAULT
stack guard size = DEFAULT
permission cache entries = 40
user log cache size = 2560
Have you been able to get some relevant data from database like db IO response time ?
Some time ago we were thinking what to chose Storwise or XIV.
I am really curies what is you Storage Guy/IBM Support is having to say why XIV is slower.
I am neither a storage guy nor a dba. While searching for this issue I somewhere read that XIV is a bit slower, and my live setup is on DS8K and this new one is on XIV. So I have taken a lun from my storage guy and done the testing. When on XIV my database disk was 100% while writing. So that's it.
Would be interesting if you could run ndisk64 with similar parameters against both disks. developerWorks: Wikis - Systems - nstress
It can give you and us some numbers, showing the difference in IOS/throughput between both technologies...