Interesting swap usage

gregsih · May 22, 2013, 7:42am

Hi

We have 2 identical T4-1's running Solaris 10 8/11 patched to 07/2012.

Both have 8G of swap allocated on the zfs root pool however a swap -s on one server shows 8G of swap available but on the other shows between 60 and 115G of swap available.

Both servers have the same amount of memory, 128G. We are having some Oracle related performance issues on the server showing 60-115G available swap and we are concerned that some type of corruption has occurred.

Thanks

Greg

jlliagre · May 22, 2013, 1:03pm

8 GB of swap space for a server with 128 GB of RAM looks undersized. What makes you suspect a corruption has occured ?

gregsih · May 23, 2013, 3:48am

The issue only seems to have occurred in the last few days. Until that time both servers showed the same swap figures.

It looks as if someone has taken your point and allocated all available memory to swap. I'm just wondering if :

a) that is posible
b) how it ws done

If the system has not been altered then why do the two identical servers show such differing swap figures? Unless some corruption has occurred.

jlliagre · May 23, 2013, 7:24am

Please post the output of these commands on both servers:

swap -s
swap -l
vmstat 2 2
prstat -Z -c 1,10 1 1
echo ::memstat | mdb -k
df -n swap -h

Which one of the swap figures both servers were showing before the event ?

There are plenty of reasons that could explain why identical hardware show different swap figures.

gregsih · May 23, 2013, 9:08am

From the server showing the issue:

 
# swap -s
total: 10964072k bytes allocated + 185592k reserved = 11149664k used, 38735512k available
 
# swap -l
swapfile             dev  swaplo blocks   free
/dev/zvol/dsk/rpool/swap 256,1      16 16777200 16777200
 
# vmstat 2 2
 kthr      memory            page            disk          faults      cpu
 r b w   swap  free  re  mf pi po fr de sr s4 s5 s6 s7   in   sy   cs us sy id
 0 0 0 81782944 96140160 21 82 0 0 0  0  3 -0  1 -0  1 1588  664 1387  0  0 100
 0 0 0 38734832 53113984 1 11 0 0  0  0  0  0  0  0  0 1936 1629 2075  0  0 100
 
 # prstat -Z -n 1,10 1 1
   PID USERNAME  SIZE   RSS STATE  PRI NICE      TIME  CPU PROCESS/NLWP
  1822 oracle   6087M 6079M sleep   59    0   0:13:22 0.0% oracle/1
ZONEID    NPROC  SWAP   RSS MEMORY      TIME  CPU ZONE
     0      128   11G   11G   8.4%   0:34:03 0.0% global
 
 
 
 
Total: 128 processes, 3333 lwps, load averages: 0.04, 0.05, 0.04
 
 
# echo "::memstat" | mdb -k
Page Summary                Pages                MB  %Tot
------------     ----------------  ----------------  ----
Kernel                     319203              2493    2%
ZFS File Data             8056420             62940   49%
Anon                      1372651             10723    8%
Exec and libs               26078               203    0%
Page cache                  13682               106    0%
Free (cachelist)             9417                73    0%
Free (freelist)           6627062             51773   40%
Total                    16424513            128316
Physical                 16410049            128203
 
#df -h -n swap
Filesystem             size   used  avail capacity  Mounted on
swap                    37G    40K    37G     1%    /var/run

From the second server

 
# swap -s
total: 3501648k bytes allocated + 283192k reserved = 3784840k used, 9495632k available
 
# swap -l
swapfile             dev  swaplo blocks   free
/dev/zvol/dsk/rpool/swap 256,1      16 16777200 16777200
 
# vmstat 2 2
 kthr      memory            page            disk          faults      cpu
 r b w   swap  free  re  mf pi po fr de sr s4 s5 s6 s7   in   sy   cs us sy id
 0 0 0 16223304 24564312 17 77 0 0 0  0  0 -2  3 -4  3 2322 1816 2135  0  0 100
 0 0 0 9494752 17840096 99 204 0 0 0  0  0  0  0  0  0 2356 2744 2699  0  0 100
 
# prstat -Z -n 1,10 1 1
   PID USERNAME  SIZE   RSS STATE  PRI NICE      TIME  CPU PROCESS/NLWP
  9613 oracle    891M  876M sleep   59    0   0:50:30 0.2% oracle/1
ZONEID    NPROC  SWAP   RSS MEMORY      TIME  CPU ZONE
     0      148 3652M 3684M   2.8%   7:01:36 0.2% global
 
 
 
 
Total: 148 processes, 3433 lwps, load averages: 0.18, 0.18, 0.18
 
# echo "::memstat" | mdb -k
Page Summary                Pages                MB  %Tot
------------     ----------------  ----------------  ----
Kernel                     743484              5808    5%
ZFS File Data            12976767            101380   79%
Anon                       440277              3439    3%
Exec and libs               22564               176    0%
Page cache                  14489               113    0%
Free (cachelist)            14968               116    0%
Free (freelist)           2211964             17280   13%
Total                    16424513            128316
Physical                 16410051            128203
 
# df -hn swap
Filesystem             size   used  avail capacity  Mounted on
swap                   9.1G    56K   9.1G     1%    /var/run

Both servers were showing the second set of output prior to Sunday last week.

Hope this helps

jlliagre · May 23, 2013, 11:56am

At first sight, the main difference is the memory used by the most active process on server #1 (oracle) which is around seven times higher than on server #2. (~6G vs 900M).

DustinT · May 23, 2013, 1:11pm

ZFS File Data             8056420             62940   49%
ZFS File Data            12976767            101380   79%

Unless your DB is stored on a zpool, you should limit the amount of ram used by ZFS to something more reasonable. On our production servers with Oracle 10 installed on the local zpool, I use a 4g zfs memory limit. This is because the SGA in Oracle can benefit from the memory more than the filesystem can.

If you are storing your DB on a ZFS pool, this advise will not apply as long as your DB's SGA is setup correctly.