How can i find memory leak in Solaris?

HI,

How can i find memory leak in Solaris.

one of my server is having 64 GB memory and noting is running right now, 2 zones was there and we halted that. but still 51 GB is using.

how can i find who is using memory?

Regards,
Ben

Try echo "::memstat" | mdb -kw"

I would suspect ZFS ARC cache has used most of the memory.

If that is the case, you can limit ZFS ARC cache in /etc/system (it will require a reboot).

For instance, added as last line in /etc/system set zfs:zfs_arc_max=4294967296 will limit the ARC cache to 4 GB.

You should consider this limit if you are running databases on ASM (real small arc cache).

Hope that helps
Regards
Peasant.

1 Like

Thanks for your help.

this is what i got

root@hrms-zones #echo "::memstat" | mdb "-kw"
Page Summary                Pages                MB  %Tot
------------     ----------------  ----------------  ----
Kernel                     635135              4961    8%
ZFS File Data             5772613             45098   70%
Anon                       141513              1105    2%
Exec and libs               11482                89    0%
Page cache                  21836               170    0%
Free (cachelist)            74375               581    1%
Free (freelist)           1551004             12117   19%

Total                     8207958             64124
Physical                  8190943             63991

As i suspected ZFS ARC, which is obvious from this line.
ZFS File Data 5772613 45098 70%

Depending on your workload you might want to reduce that. Or not. Depending on the application in question and memory needs.
Since you have 12 GB free everything seems to be in order.

we have added so many local disks.

how can i set the memory for zfs. how can i calculate my work load. is there any standards or based on filesystem zine....

root@hrms-zones #zpool status
  pool: hrms-pool
 state: ONLINE
 scan: none requested
config:

        NAME         STATE     READ WRITE CKSUM
        hrms-pool    ONLINE       0     0     0
          raidz1-0   ONLINE       0     0     0
            c1t1d0   ONLINE       0     0     0
            c1t2d0   ONLINE       0     0     0
            c1t3d0   ONLINE       0     0     0
            c1t4d0   ONLINE       0     0     0
            c1t5d0   ONLINE       0     0     0
            c1t6d0   ONLINE       0     0     0
          raidz1-1   ONLINE       0     0     0
            c1t7d0   ONLINE       0     0     0
            c1t8d0   ONLINE       0     0     0
            c1t9d0   ONLINE       0     0     0
            c1t10d0  ONLINE       0     0     0
            c1t11d0  ONLINE       0     0     0
            c1t12d0  ONLINE       0     0     0
        spares
          c1t13d0    FAULTED   corrupted data
          c1t13d0    AVAIL

errors: No known data errors
root@hrms-zones #zfs list
NAME                            USED  AVAIL  REFER  MOUNTPOINT
hrms-pool                      1.76T   913G  62.3K  /hrms-pool
hrms-pool/devapr12             49.5G  75.5G  49.5G  /data
hrms-pool/devdbr12              634G   166G   634G  /data
hrms-pool/uatapr12             92.4G   158G  92.4G  /data
hrms-pool/uatdbr12              703G   913G   703G  /data
hrms-pool/zone-roots            328G   913G  66.4K  /hrms-pool/zone-roots
hrms-pool/zone-roots/devapr12  7.56G   913G  7.56G  /hrms-pool/zone-roots/devapr12
hrms-pool/zone-roots/devdbr12   320G   913G   320G  /hrms-pool/zone-roots/devdbr12
root@hrms-zones #

You can limit it as i said in previous post by adding a line in /etc/system and rebooting the machine.

If you are not experiencing problems and you are not running databases on ASM or application which take large segments of memory in one turn, leave it.

12 GB of memory is free when command was issued, so there is plenty of free memory.

This looks strange :

        spares
          c1t13d0    FAULTED   corrupted data
          c1t13d0    AVAIL

How did you achieve this ?

12 GB is free but all the application and database zones are down. when i start that it reaches to 62 GB.

i have added below to system file and gave instructions to team to reboot system.

 set zfs:zfs_arc_max=4294967296

After i set the kernal parameter, rebooted system and it's not growing

root@hrms-zones #echo "::memstat" | mdb "-kw"
Page Summary                Pages                MB  %Tot
------------     ----------------  ----------------  ----
Kernel                     365368              2854    4%
ZFS File Data              447053              3492    5%
Anon                       992441              7753   12%
Exec and libs               49195               384    1%
Page cache                  32038               250    0%
Free (cachelist)            35052               273    0%
Free (freelist)           6286811             49115   77%

Total                     8207958             64124
Physical                  8190943             63991

thanks for your help

From everything that you have shown us, you are now happy that access to ZFS filesystems will be slower than they were before. Before you limited ZFS's use of free memory, it was using lots of free memory as a disk cache. Now, you have lots of free memory that is free, but can't be used by ZFS.

Before you made this change and when the databases were running, you had less free memory. Why is that a problem? Was it causing a lot of swapping?

Why buy 64GB of memory if your goal is to keep 49GB free (i.e., unused)?

I am getting some error from the DBA team,

currently oracle database thread size was set to 4 Gb and they tried to increase that ater this ZFS ARC setting.. They are getting some error.

please find attached.

is it related to this ZFS settings?

If Oracle is unable to allocate 32 bytes, the problem is likely to be that the "oracle database thread size was set to 4 Gb"; not that the ZFS buffer cache size has been restricted.

What was mdb showing when Oracle failed?

this machine has only 2 zones, application and database. when the memory issue was there, application pages was not loading and they were keep on complaining about the system. and team used to reboot full server every week to resolve this issue.

i am not sure about issue faced before this settings. only thing from mbd i can see like 45 GB memory was used by zfs cache.

---------- Post updated at 02:15 PM ---------- Previous update was at 02:09 PM ----------

this is current mdb status

root@hrms-zones #echo "::memstat" | mdb "-kw"
Page Summary                Pages                MB  %Tot
------------     ----------------  ----------------  ----
Kernel                     360511              2816    4%
ZFS File Data              457370              3573    6%
Anon                       132190              1032    2%
Exec and libs                7571                59    0%
Page cache                  27998               218    0%
Free (cachelist)            70049               547    1%
Free (freelist)           7152269             55877   87%

Total                     8207958             64124
Physical                  8190943             63991

Yes oracle database thread size was set to 4 Gb and they want to increase that. while trying to increase thread size only they are getting error

They cannot assign the memory to oracle after the zfs arc line in /etc/system and there is a lot of free memory during that operation ?
If so, take a look how that user is setuped, especially projects. Solaris uses projects to impose various limits to users projects -l

What is your SGA ?
Are you running your database as a user ? (oracle?)
Are you using zfs filesystems for your database or ASM ?

Zones also have a limit system regarding memory and cpu which is defined in zone configuration file.

This is fairly well documented on oracle site regarding setup of solaris OS for oracle database (global and non-global zones).
Numbers in that documentation are subject to your environment needs.

For instance, in documentation parameter project.max-shm-memory is
4294967295 which is equivalent to 4 GB. This parameter should probably be increased in real life scenario to much higher value (again, depending on your workload, but looks like the errors are coming from there...)

As you might notice, me, as well as others, are guessing
That is not nice from your side, and time consuming from ours.

Hope that clears things out
Regards
Peasant.

Hi There is no project is created in server.

system
        projid : 0
        comment: ""
        users  : (none)
        groups : (none)
        attribs:
user.root
        projid : 1
        comment: ""
        users  : (none)
        groups : (none)
        attribs:
noproject
        projid : 2
        comment: ""
        users  : (none)
        groups : (none)
        attribs:
default
        projid : 3
        comment: ""
        users  : (none)
        groups : (none)
        attribs:
group.staff
        projid : 10
        comment: ""
        users  : (none)
        groups : (none)
        attribs:

and there is no cpu/memory cpping implemented from zone level.