Hi, i have 2 identical web servers using AIX. I use nmon analyser to check their performance.
The server A exceeds 20% memory usage for system, 5% for cache and the rest 75% for processes. While, it uses 4% of Paging Space.
The server B exceeds 20% for system, 45% for cache and 35% for processes.
I tried using svmon, ps and topas to investigate the memory usage and finally i found at server A the processes which run at Paging Space and how much (MB) they use.
Although i cannot find which processes need that much memory 40% in server A (comparing A-75% and B-35%) and cause this issue.
Could you please help me?
If you need any output etc, just text me or pm... thank you in advance.
I regret that I am unable to hack into your servers and my crystal ball is in for servicing. Can you post some meaningful output illustrating why you think you have a problem?
It is usual for Unix/Linux servers of all flavours to keep their memory full, i.e. cached of anything they have used, just in case it can be re-used without having to wait for disk IO. Are you seeing paging space used? The output from lsps -a would show you this sort of thing. It is extreeeeemly unlikely that the two servers will have done exactly the same thing.
The output from vmstat 5 should also have a few columns about paging in & out counts.
If your server is not using paging space, then you don't generally have a memory problem. What is making you concerned?
Ok, first of all you have to get your crystal ball back!
Thanks for your insterest and your help in advance! as i wrote at the first post, i noted that my concerns are for that server A exceeds almost double memory for processes. If you could ask me which i should expect to use more memory is server B!
Small disclaimer : i used AIX really little, most of the time migrating stuff off of it
Looks like server B in your case, has more filesystem caching activity going on.
Meaning, if this is some kind of cluster, and you failover a service which is using that filesystem, it's bound you will have slower response until same read patterns occur on serverA filesystem, and are cached to memory.
So, basically, context is required in this case.
What services are running on that server ?
How are those two servers sharing load - are requests hitting both or failover scenarios etc. ?
I mean, what doesn't actually work or works slower on specified server so you need to investigate the memory ?
There is no problem on functionality yet... but this is why i check the performance daily using nmon analyser, to prevent any issue! I hope so..
Both servers have web services, http, java etc..
It shouldn't use that much memory for processes (server A). Actually it should work as server B.
This is why i am concerned...
I am new in AIX, but what i understood for memory usage is that the system uses some, then the processes uses whatever they need and the rest memory will be used for filesystem. In case, there is need for more memory there will be used paging space, is this correct?
If what i understood is correct, then there is problem on server A as there are processes which use paging space!
Most systems will give up cache before they start eating into swap space, so you are not at 0% RAM.
I think what might have happened is, at some point in the past, a process on system A grew very large, large enough to start using swap, then finished or died to give it up again.
Corona666,
thanks for your answer.. yes i tried ps aux... it might have sense what you just wrote... if i get it clear, when a process grows up very much, then the swap memory will be used... when this process is done, the new processes of this program will keep launching on swap memory, because filesystem will be launched on real memory? why??
as i get it, a process will grow up that much to use first all real memory and will stop filesystem using real memory, right? if this process has bigger priority and keeps growing, then other processes will be moved at swap memory. So, when this very big process ends, then the processes on swap memory will get back at real memory, right? and if there is more free memory, it will be used by filesystem... am i correct?
eventhough, as i mentioned at the beginning, on 2 identical servers the first uses memory as 20% system, 30% processes and 50% filesystem and on second one 20% system, 75% processes and 5% filesystem, and 4% swap!
How can i identify on second server, the processes using 75%?
Thanks for your help too... but neither your solution helped... i think..
I run your command. The sum of the second (sorted) column 4 times devided by 1024 is not the number I expected. I mean it is not equal to what nmon shows... (75%).
In addition, your command (actually ps) gets all processes (user and system) or only user processes...?
It'd typically show all processes running on your system.
Are you having a performance problem or just interested in finding processes that are consuming ~75% of system memory?
No. At one point that big process forced some memory into swap, when it completed or died most of the swap came back out, except for that 2% which nobody's actually needed to use yet. That's my theory on where the 2% swap came from and why it's just sitting there.
More or less.
Less as in, memory won't come out of swap until they try to use it. This prevents some waste since that memory would be unused anyway and I think accounts for your 2% swap.
That's normal unless configured otherwise, yes.
Something big is hanging around, which used to be even bigger, I think.
Have you tried ps aux? Or as madeingermany suggested, ps -e -o pid= -o rss= -o vsz= -o args= | sort -k2,2n ? What does it look like?
Great! As i read to your post, it seems that i get clear till now the performance of memory... that 's exactly how i think it works!
Now let's finalize with the process which use that much memory and push others to run in swap...
As i wrote in previous post, i ran the command as "madeingermany" proposed and the sum of the second column (sorted RSS) of every process, 4 times, devided by 1024, it gets almost 1GB... when my installed memory is 4GB! So, if by using this command i get only user processes (not system), based on nmon (75%), it should be 3GB! why is it 1GB?
-e ought to show every process. If it's not in "RSS", it might be in shared pages, which are notoriously hard to tally since they overlap with no particular owner between processes. Oracle in particular likes to use a lot of share, and manages its own memory internally.
I get this...
ps -e -o size -o args= | sort -k 1,1
ps: 0509-048 Flag -o was used with invalid list.
Usage: ps [-AMNZaedfklm] [-n namelist] [-F Format] [-o specifier[=header],...]
[-p proclist][-G|-g grouplist] [-t termlist] [-U|-u userlist] [-c classlist] [ -T pid] [ -L pidlist ]
[-@ [wparname] ]
Usage: ps [aceglnsuvwxX] [t tty] [processnumber]
While i ran the command (Madeingermany proposed)
ps -e -o pid= -o rss= -o vsz= -o args= | sort -k2,2n
and while Server B confuses me, in Server A if you add the RSS(*4/1024) is almost the same as in nmon for processes and system!!
So, it seems the command to give as output the expected result, but not on Server B...
I Just ran the command on a third similar server and i had the expected results again!!!
First off: you could have looked it up by searching the forum, for example here, post #6, if you didn't want to read the man page where the various options to ps are also explained. No problem, i will write it down once again:
ps -Alo pid,vsz,args
"pid" gives the process number, "vsz" the virtual memory used by the process, "args" the commandline used to start the process. Add/rearrange columns for your needs. You will also find some other useful commands to gather information about memory usage in this thread. Especially the output of vmo -a would be interesting.
Your systems are both not swapping (although one uses a very small fraction of the swap) but you can anticipate beginning swap activity even before it happens: issue vmstat -vs and have a look at the number of "revolutions of the clock hand" (or similar - i can't access an AIX system right now). This is the number of times the free memory page scanner has already searched the whole memory for free pages. The faster this number grows the nearer the system gets to the point where actual swapping starts.
And a last tip: when issuing vmstat you should use the -w option always. You will get a neatly formatted table again this way.
Thanks Bacunin (a phrase I never expected to say :P),
To be honest, i am really tired with this issue.... that's why i didn't even search man page.
I have looked too many threads on this forum, and other forums as well.. but i just don't get it..
i used command you proposed, but this doesn't make any sense... neither ps -Alo, nor vmstat.
vmo -a gets almost the same output on both servers..
Could you please explain to me how you figured out that there is no memory problem?
If it is so, why on Server A the processes are on 35% and on Server B on 70% using 4% pgsp?
Yes, shared memory is often not considered, because it can be used by many process - or it is counted several times.
A special type of shared memory is the SysV IPC (inter process communication), listed with
ipcs -p
E.g. for the "Shared Memory" the listed PIDs are for the "Creator" and the "Last attacher".
Even the normal fork/exec to create a new process produces shared memory.
Because initially they are identical, they are not actually copied but the new process reuses the memory. They are really copied when the new process modifies it. This is called COW (copy on write).