I am new to AIX, I have few AIX 5.3 servers and I could see there are significant difference in paging space utilization on servers even though they are running same applications
below server is working fine which shows 2-5 % paging usage throuh out the day
out of the blue I would try to set minperm% to 5 and maxperm% to 90 ( maxclient% as well to 90) and check if the behaviour changes. lru_file_repage is already set to 0 so you usually go with something above.
What type of application(s) is running on the box, a DB, ...?
It would also be interessting to see which processes are using the most paging space:
svmon -P -O sortentity=pgsp
Can you also please post the outputs of:
vmstat -w -t 1 10 # When you notice paging
vmstat -vs # This one anytime you want
And the most important of all:
Use code tags for the output
Lots of free memory on a well warmed up server is just an indication of lots of exiting processes as part of the app. Exit frees the RAM footprinnt of the local parts of the process (heap and stack -- code is usually shared pages), bumping up free RAM until paging in eats it away. Sometimes this indicates too much shell programming, or bad shell programming, as processes are executed and discarded over and over. Sometimes these are spun off from server functions. The JAVA webserver process is probably not really freeing RAM to the OS, as free() is just putting it back in the free pool of the memory allocation arena awaiting the next malloc(), or the JAVA gc equivalent.
The svmon did not show much, seems this sort is an > AIX 5.3 thing; try this one and check which has/have by far the highest value in the column "Pgsp":
I would set parameters as recommended in my former post and see if the paging stops, hopefully.
This may take some time since there is still a lot out on the disk that needs at least to be paged in when it's needed. To speed this up, reboot the box after setting the following command...
You can set them online with the following command:
I've seen this type of thing before although it was quite a number of years ago, it actually turned out to be a memory leak from badly written code. It was hard to track it down, we had four P695's alll running the exact same application suite - just turned out to be that the one used for reporting had the problem.
So possible looking at the usage of eack of the boxes may provide a pointer.
Thanks, is there any calculation possible to find out the optimum value for the tunable parameters, or if I tweak these values how much it is going to contribute to my current high paging utilization issue.
$ $ svmon -G
size inuse free pin virtual
memory 9437184 9424496 12688 1796740 8576772
pg space 7864320 2000371
work pers clnt other
pin 1490348 0 0 306392
in use 7671239 0 1753257
PageSize PoolSize inuse pgsp pin virtual
s 4 KB - 3481600 346035 1655556 2022452
m 64 KB - 371431 103396 8824 409645
$ lsps -s
Total Paging Space Percent Used
30720MB 26%
$ vmstat -vs
26994260366 total address trans. faults
4361829552 page ins
3820265318 page outs
84045532 paging space page ins
77865859 paging space page outs
0 total reclaims
8236127852 zero filled pages faults
11712277 executable filled pages faults
15174415073 pages examined by clock
25517 revolutions of the clock hand
5331955580 pages freed by the clock
65710515 backtracks
175898 free frame waits
0 extend XPT waits
284609239 pending I/O waits
8157163408 start I/Os
855241712 iodones
43057036717 cpu context switches
16995406850 device interrupts
1964838815 software interrupts
20009814065 decrementer interrupts
165822097 mpc-sent interrupts
165822048 mpc-received interrupts
1353839635 phantom interrupts
0 traps
192398322217 syscalls
9437184 memory pages
9048488 lruable pages
11106 free pages
5 memory pools
1796740 pinned pages
80.0 maxpin percentage
20.0 minperm percentage
50.0 maxperm percentage
19.3 numperm percentage
1750906 file pages
0.0 compressed percentage
0 compressed pages
19.3 numclient percentage
50.0 maxclient percentage
1750906 client pages
0 remote pageouts scheduled
4760 pending disk I/Os blocked with no pbuf
15264944 paging space I/Os blocked with no psbuf
1972 filesystem I/Os blocked with no fsbuf
27374 client filesystem I/Os blocked with no fsbuf
411174 external pager filesystem I/Os blocked with no fsbuf
0 Virtualized Partition Memory Page Faults
0.00 Time resolving virtualized partition memory page faults
Have had a look back here several times and as this is beginning to intrigue me, I'd like to investigate this a little further.. It has been some time since I worked on AIX, but I did do a fair bit with 5.3 and 6.1 back in the day.
There are some things that you should be aware of, as far as the page and swap goes along with the run queue and the CPU I/O wait. These parameters are heavily interdependent in AIX, quite often giving rise to apparent performance issues where there aren't any.
So having read over a report that I prepared about 30 months ago, for a group of AIX systems with around 15K users - I'd like to clarify a couple of things. These are just simple info gathering steps!
Is there a perceivable degradation in performance when the swapping happens?
We havnt noticed any performance impact but the paging utilization keep on increasing and eventually we have to restart the application every day to bring it back to normal.
This issue we are monitoring every day for couple of weeks now but with same application with other AIX 5.3 server even with less physical memory utlizing only 2-3% ram in the environment.
yes restarting the application or restarting the system always clear this issue.
There looks to be a memory leak in the application you restart. If the OS, patches and hardware are the same on the affected server than the ones where the issue doesn't show up, there is unlikely anything you can fix with kernel tuning.
Please post statistics about the leaking process, not just global ones.
As Jllagre says this is almost certainly application related, what I think you'll have to do here is check the applications are at the same versions. If that is the case - you'll probably have to investigate whey one is holding leaked memory and the other isn't. Most likely down to the way the application is being used if they are both the same.
If these systems are at the same MU level then I think you'll need application support to resolve.
Well basically no matter how much 'free' memory the system has - with minperm set to 20% the host will inevitably page when it uses 34GB memory computational out of 36GB available - especially as long as numperm is at 20%.
As zaxxon said, fix the minperm setting (to 3% or 5%) and figure out if you really need to keep cached content in memory even though the corresponding IO has been completed. If not mount your filesystems with rbrw option. Not sure if you mentioned if this is an app or a DB server - but if its i.e. oracle, you may want to consider cio and/or filesystem_io_options=SETALL
Regards
zxmaus