the fr (page freed/page replacement) and sr (pages scanned by page-replacement algorithm) values from the vmstat output (see below please) are very high. I usually see this high value during the oracle database backup. In addition, the page scan/page steal/ page faults values also very high..
Is this meaning that the server memory is maxed out?
Is there any tuning opportunity that we need to do?
Any advise is greatly appreciated. Thanks!
kthr memory page faults cpu
----- ----------- ------------------------ ------------ -----------
r b avm fre re pi po fr sr cy in sy cs us sy id wa
3 0 3448445 3898 0 0 0 1927 1928 0 298 3070 3700 68 3 19 10
4 0 3448448 4429 0 0 0 3081 3087 0 316 4751 3797 72 4 18 6
3 0 3448447 4184 0 0 0 1542 1543 0 319 3189 3707 70 3 20 7
5 0 3449110 4690 0 0 0 3484 3484 0 350 13162 3832 69 5 20 6
3 0 3449188 4121 0 0 0 1824 1824 0 302 3945 3684 66 3 25 7
4 0 3449178 4003 0 0 0 1933 1934 0 324 3851 3784 72 3 18 8
to File System 0.0 2558.5
Page Scans 1251.0
Page Steals 1219.
Page Faults 756.0
it doesn't matter how high the numbers of sr/fr are by themselves - the only important thing here is the ratio - sr:fr - it should genuinely not be higher than 4:1 - as that is about where performance issues start - if it goes to 10:1 or higher than your system is spending more time freeing up memory than doing real work. In your case it is pretty much 1:1 what is just fine.
Your output above would be more helpful if we would know how much physical memory you have - ideally avm (in 4k pages) should not exceed 75% on an oracle box for best performance since oracle is a process based DB and every process / connection needs some memory on top of the SGA that is set within oracle - if it goes above 85% or if your free list drops against 0 you are in serious trouble as well - or if you start seeing pi/po values ...
Page faults are no reason for concern after all - they only mean that your box is doing work (and using memory pages).
What I would consider reason for concern is your very very low free list. If this is NOT an asm system, than I would recommend to mount your /dumps filesystem with noatime and rbrw option, all other oracle related filesystems with noatime option as well - and to have your DBAs switch oracle to SETALL. This will give your system a lot of desperately needed memory back.
You should as well consider exporting AIXTHREAD_SCOPE=S .... either as system wide variable in /etc/environment or at least in the oracle .profile
If you want more tips please post the vmstat -Iwt output again including your resources, vmstat -s and vmstat -v outputs.
It's an Oracle ASM enviornment. It has 15.5GB of RAM where SGA is taking 9GB out of that. The Oracle DBA suggests to set lock_sga=TRUE, and there is additional settings from AIX side but I'm not sure what it is to make lock_sga=TRUE.
You're so right. The free list is very very low..and that caused a lot of performance problem.
Also In Oracle, the SGA_TARGET parameter manages memory inside the database. Do you know what is the parameter on AIX that automatic manage memory for the I/O buffer cache and application cache?
I'm a beginner to AIX. Thanks for your insight!
Sam -
System configuration: lcpu=8 mem=15424MB
kthr memory page faults cpu time
----------- --------------------- ------------------------------------ ----------- ------- ----------- --------
r b p avm fre fi fo pi po fr sr in sy cs us sy id wa hr mi se
1 1 0 3394322 67820 33 77 0 0 103 197 347 16926 3702 7 2 89 3 00:36:52
---------- Post updated at 01:48 AM ---------- Previous update was at 01:44 AM ----------
By the way, the system is now a little quite. It's extremely busy with very little "free list" when the RMAN database backup running.
---------- Post updated at 01:49 AM ---------- Previous update was at 01:48 AM ----------
vmstat -s
2371477560 total address trans. faults
32145399 page ins
72931135 page outs
1897 paging space page ins
2707 paging space page outs
0 total reclaims
905517288 zero filled pages faults
62779256 executable filled pages faults
186955953 pages examined by clock
110 revolutions of the clock hand
97559480 pages freed by the clock
23360510 backtracks
0 free frame waits
0 extend XPT waits
2245184 pending I/O waits
105074996 start I/Os
6878603 iodones
3508959323 cpu context switches
329345963 device interrupts
55512985 software interrupts
1691341351 decrementer interrupts
46873 mpc-sent interrupts
46873 mpc-receive interrupts
181933 phantom interrupts
0 traps
16044009781 syscalls
vmstat -v
3948544 memory pages
3743595 lruable pages
66387 free pages
4 memory pools
607209 pinned pages
80.0 maxpin percentage
3.0 minperm percentage
90.0 maxperm percentage
8.2 numperm percentage
307386 file pages
0.0 compressed percentage
0 compressed pages
8.2 numclient percentage
90.0 maxclient percentage
307386 client pages
0 remote pageouts scheduled
0 pending disk I/Os blocked with no pbuf
72 paging space I/Os blocked with no psbuf
2228 filesystem I/Os blocked with no fsbuf
3602 client filesystem I/Os blocked with no fsbuf
1801 external pager filesystem I/Os blocked with no fsbuf
0 Virtualized Partition Memory Page Faults
0.00 Time resolving virtualized partition memory page faults
System configuration: lcpu=8 mem=15424MB
kthr memory page faults cpu time
----------- --------------------- ------------------------------------ ------------------ ----------- --------
r b p avm fre fi fo pi po fr sr in sy cs us sy id wa hr mi se
1 1 0 3393363 66250 33 76 0 0 102 197 347 16928 3702 7 2 89 3 00:49:16
first of all - if your oracle SGA is 9 GB than your system will hardly ever be happy with less than 18 GB memory. You are paging even though your tuning is fine - that means that you should physically have more memory to satisfy the needs of the box ... a DB server should never have to page.
I am not a fan of locking the SGA just because you are too low in memory. If its a single instance database and you are not going to use huge pages, than the better option is to add the memory the system needs and leave the memory unpinned. Pinning memory on a memory-constrained system will cause more paging - of your user processes what makes queries take longer and batches to overrun. It will not benefit your backups either. And - if the amount of memory you are going to pin is large relatively to the total physical memory, than you are running additionally the risk of a system crash when your system reaches the magical 83% threshold. AIX cannot pin more than a little over 80% in total - and the kernel pins depending on the workload a significant amount of memory over time as its a dynamic (learning) kernel - if your system is doing a lot of different things, than this can be easily be 25% after a week - though I have never seen a kernel pinning more than 35% in total no matter how long it's up, that still might lead to problems when you are pinning more than 50% from scratch to oracle.
If you still insist in doing it ...
I am not sure what you mean with that - basically vmm is responsible for managing all memory on AIX except what is taken away by the SGA and therefor made unaccessible for the system. It is well known that backups are big memory consumers as each IO obviously needs to be buffered. The command vmo -r -o v_pinshm=1 would allow oracle to do the lock_sga but as said before - it is a lot better and safer for the system to add the memory it needs and leave the SGA unlocked.
Now some good news - from the above I can see that your free list NEVER dropped to 0 - that means that lrud is doing its job scanning and freeing properly. If we now can get the paging under control by adding more memory you should be good.
I can see as well that your system would only start paging out Oracle related processes when your computational memory (avm x 4k) would exceed 97% what doesnt seem to be the case on your box (at least in the outputs you have pasted) - but I am quite sure as soon as rman kicks in this is pushing you over the edge.
Since you are running asm, do you still use a /dumps filesystem for the backups or does the DB do it directly to tape ?
I still would love to see a vmstat -Iwt 2 10 output from a timeframe when your system is really busy with normal work - and one from when rman runs ...
BTW - are running AIX 5.3 or 6.1 - and which oracle version ?
for asm and sybase which both use rawdevices of some kind, I usually set ioo -p -o lvm_bufcnt=16
Apart from that you surely could do with a few more gigs of memory as your computational usage is really high for an oracle DB. Apart from that - I think your cpu waits are very high - this usually points to problems with the disk subsystem which could have all kinds of reasons - maybe your async IO settings are too low what usually is the case ... set the maxreqs number to 65536 (smitty aio). Check with iostat -Dl if your disks wait queues are running full and your disk response times. Is the above output from while you are running rman ?
fairly normal picture during backups as you are having naturally lots of IO (that is what the backups do ).
So far the only thing I would be concerned of is the page in - but not the fi/fo - these are just your reads and writes - and there is no ratio
The ratio is between sr/fr and that is pretty much 1:1 in your output what is ok - as rman is a very IO intense process.