Where is my memory? How to fix it ?

Hello,

I have few Solaris 11.2 servers. Since sometime, we are noticing high memory utilization in them.
Here is current config

root@ser22-zonemgr:~# /tmp/memory.sh
        Physical memory size:            36864
        Memory usage in MB:              32938
        Memory usage in %:               89%
root@ser22-zonemgr:~#
root@ser22-zonemgr:~# echo ::memstat -v | mdb -k
Page Summary                 Pages             Bytes  %Tot
----------------- ----------------  ----------------  ----
Kernel                      721833              5.5G   15%
Guest                            0                 0    0%
ZFS Metadata                 74346            580.8M    2%
ZFS File Data               853646              6.5G   18%
Anon                       2351933             17.9G   50%
Exec and libs                 2369             18.5M    0%
Page cache                  222555              1.6G    5%
Free (cachelist)              4922             38.4M    0%
Free (freelist)             486988              3.7G   10%
Total                      4718592               36G
root@ser22-zonemgr:~#

From prstat -->
 NPROC USERNAME  SWAP   RSS MEMORY      TIME  CPU
   115 timesten   16G   15G    43%  76:49:22 1.6%
   106 root     2887M 2698M   7.3% 383:50:51 2.2%
     8 daemon     41M   56M   0.2%   1:15:31 0.0%
     8 pkg5srv    37M   38M   0.1%   1:07:26 0.0%
     8 john    12M   34M   0.1%   0:00:00 0.0%
     2 smmsp    3936K   13M   0.0%   0:03:25 0.0%
     2 netadm   3808K 9032K   0.0%   0:07:34 0.0%
     2 netcfg   2560K 6664K   0.0%   0:07:39 0.0%
     2 hspokes   2408K   15M   0.0%   0:00:00 0.0%
     2 noaccess 2088K 8096K   0.0%   0:00:02 0.0%

And here is from another server

ser42-zonemgr:~# /tmp/memory.sh
        Physical memory size:            16384
        Memory usage in MB:              14653
        Memory usage in %:               89%
ser42-zonemgr:~# echo ::memstat -v | mdb -k
Page Summary                 Pages             Bytes  %Tot
----------------- ----------------  ----------------  ----
Kernel                      674735              5.1G   32%
Defdump prealloc            119423            932.9M    6%
ZFS Metadata                111996            874.9M    5%
ZFS File Data               168163              1.2G    8%
Anon                        877423              6.6G   42%
Exec and libs                 2496             19.5M    0%
Page cache                   17788            138.9M    1%
Free (cachelist)                 7               56k    0%
Free (freelist)             216864              1.6G   10%
Total                      2097152               16G
ser42-zonemgr:~#

As I read few forums, I understood that the memory is filled with unmapped pages of data read from disk. It's kept in memory because those files may be read again and keeping the data in memory saves a disk read. That is where, OS is holding up memory. This is leaving less memory, which can be called as 'free'. Though whenever any application or DB needs that memory, OS will release. But our problem is, monitoring system. When total utilization hits 90% (right now it is 89%), it will page out SA.

What can be solution for this issue ?
Any recommendations ?

Thanks

Hi,

There is an excellent Oracle Document covering this on the Oracle site Document ID 1663862.1 it give in depth information covering your monitoring issue.

This should at least point you in the right direction.

Due to the way that the MMU and ZFS behave, this can be an issue for monitoring tools.

I've also just noticed that you say that you've recently noticed the increase in memory utlisation, an other possible cause could be memory leakage caused by applications code not fully releasing memory. Do you have any trending information, does your monitoring tool produce such information - this is generally easy to see over a longer term.

Regards

Gull04

2 Likes

Thanks for document. This is really good one and gave valid information, also about benefits of 'user_reserve_hint_pct' over 'zfs_arc_max'.

However, I was getting some confused, what should I set initially as per my server's load and memory utilization. Because there are no previous/desired zfs_arc_max, so I can't user formula mentioned in this document. Should I start with 60 ? These are production servers, so a little of performance issue will raise some noises.

Tomorrow morning, I will request report from monitoring team, if they can give and that should help me to understand the trend.

Hi,

I'm not really a great fan of trying things out on production servers, it always seems to come back and bite you from an other direction. If your environment is anything like most of the ones I have worked in over the years, test and development machines seem to be scarce and if they exist - getting testing resource is like finding a winning lottery ticket.

It's a difficult call for me, I don't know anything about your environment other than you are using Solaris 11.2, one server has 36Gb of memory, the other has 16Gb of memory and they could be whole root zones. And of course the fact that both servers are reporting almost 90% memory utilisation which will trigger an alert.

Much of this is going to come down to a few things here;

  1. Can you do this on a Test/Dev sever?
  2. Does going ove 90% memory utilisation cause a problem with the application.
  3. Can you live with the alert generated at the 90% threshold, can it be ignored or the threshold changed.
  4. Do you have to try this change in a production environment, if there are problems whats the regression?
  5. How critical is the production server, if it's critical could you test on the DR server?
  6. Will your boss back you up if you take the iteritive approach and set "user_reserve_hint_pct" to 60% as Oracle suggest.

I know the above isn't overly helpful, but asking someone with no experience of the environment in which you work just opens up many more questions than can't be easilly answered in a forum such as this.

My preferred approach to this would be to run an explorer over the system and raise a support ticket (case) with Oracle support, in the ticket you should cite your concerns - please be aware that Oracle may well come back and suggest things like the patching is brought up to date for the systems.

Regards

Gull04

1 Like

Thanks. I will follow it