High CPU+Memory comsumption

Hi All,

I have Solaris-9, Sun Fire V1280 with uptime of 501 Days. My Big brother monitoring showing me 90% + memory utilization on this box. Since this is production box, I can not reboot it. Is there way to know, what is consuming so much ? It is affecting my other environment on the box. Below are stats, which can help to understand my problem.

From prstat -a
 NPROC USERNAME  SIZE   RSS MEMORY      TIME  CPU
    34 autopp1  8577M 7133M    45% 175:31:28  19%
    54 root      647M  317M   1.9%   8:00:28 3.2%
     6 bb       6336K 5304K   0.0%   0:13:48 0.0%
     1 smmsp    4520K 1304K   0.0%   0:00:03 0.0%
:/var/tmp# /usr/local/bin/top
last pid: 24465;  load averages:  2.34,  2.55,  2.60                                                                 05:45:34
95 processes:  91 sleeping, 4 on cpu
CPU states: 43.7% idle, 40.8% user, 11.3% kernel,  4.2% iowait,  0.0% swap
Memory: 16G real, 1912M free, 6058M swap in use, 26G swap free
   PID USERNAME LWP PRI NICE  SIZE   RES STATE    TIME    CPU COMMAND
 17546 autopp1   15   0    0 1103M 1015M cpu/2   11:49  7.71% content.exe
 13595 autopp1   15   0    0 1965M 1794M cpu/0   44.6H  7.24% content.exe
 16162 autopp1   15  43    0 1309M 1194M cpu/8   15:21  6.46% content.exe
 26233 root       3  60    0   53M   28M sleep   26:59  4.22% clBackup
 13624 autopp1   15  59    0 1712M 1540M sleep   44.4H  4.16% content.exe
 26239 root       1  58    0   45M 6536K sleep    4:00  1.05% nwp
 
:/var/tmp/prustat -t5 5 (Dtrace thing)
  PID   %CPU   %Mem  %Disk   %Net  COMM
13595   8.77  11.43   0.00   0.00  content.exe
16162  10.11   7.69   0.00   0.00  content.exe
17546  10.63   6.54   0.00   0.00  content.exe
13624   4.87   9.81   0.00   0.00  content.exe
26233   3.60   0.18   0.00   0.00  clBackup
  PID   %CPU   %Mem  %Disk   %Net  COMM
13595   8.76  11.43   0.00   0.00  content.exe
16162  10.05   7.69   0.00   0.00  content.exe
17546  10.61   6.54   0.00   0.00  content.exe
13624   4.84   9.81   0.00   0.00  content.exe
26233   3.61   0.18   0.00   0.00  clBackup
:/var/tmp# ps -ef | cut -c42-100 | sort -nr | head
24290:33 fsflush
4996:25 ./dih-t3ext.exe
2682:48 /t3/data/autonomy/IDOLServer/IDOL-t3ext3/content/co
2669:46 /t3/data/autonomy/IDOLServer/IDOL-t3ext4/content/co
283:57 /etc/init -
57:15 /t3/data/autonomy/IDOLServer/IDOL-t3ext5/content/cont
55:32 /opt/galaxy/Base/cvd
49:42 ./AutonomyIDOLServer-t3ext5.exe
29:31 /opt/galaxy/iDataAgent/clBackup -child 23666 -j 20676
24:53 ./AutonomyIDOLServer-t3ext3.exe

Though application team has restarted content.exe, still it showing big. From stats if I calculate, I do not see if it is consuming 90%

From top output for the Memory you see: Memory: 16G real, 1912M free... would look like almost 90%...
Since we dont know the configuration of your system it will be difficult to say more...
(e.g. is is /tmp swap? ...)

Thanks vbe for replying. Here is the configuration of my system

:/# prtdiag -v | head -20
System Configuration: Sun Microsystems  sun4u Sun Fire V1280
System clock frequency: 150 MHZ
Memory size: 16384 Megabytes
======================================= CPUs =======================================
                   E$          CPU                  CPU    Temperature
CPU      Freq      Size        Implementation       Mask    Die   Amb.  Status      Location
-------  --------  ----------  -------------------  -----   ----  ----  ------      --------
      0  1200 MHz  8MB         SUNW,UltraSPARC-III+  6.0     42C   24C  online      SB0/P0
      1  1200 MHz  8MB         SUNW,UltraSPARC-III+  6.0     43C   24C  online      SB0/P1
      2  1200 MHz  8MB         SUNW,UltraSPARC-III+  6.0     42C   23C  online      SB0/P2
      3  1200 MHz  8MB         SUNW,UltraSPARC-III+  6.0     41C   24C  online      SB0/P3
      8  1200 MHz  8MB         SUNW,UltraSPARC-III+  6.0     41C   24C  online      SB2/P0
      9  1200 MHz  8MB         SUNW,UltraSPARC-III+  6.0     41C   25C  online      SB2/P1
     10  1200 MHz  8MB         SUNW,UltraSPARC-III+  6.0     42C   24C  online      SB2/P2
     11  1200 MHz  8MB         SUNW,UltraSPARC-III+  6.0     41C   24C  online      SB2/P3
==================================== IO Devices ====================================
Bus   Freq      Slot +  Name +
Type  MHz       Status  Path                          Model
:/# swap -l
swapfile             dev  swaplo blocks   free
/dev/vx/dsk/bootdg/swapvol 273,96001     16 49160240 49152944
:/# df -h /tmp
Filesystem             size   used  avail capacity  Mounted on
swap                   5.0G     8K   5.0G     1%    /tmp
:/# vmstat 5 5
 kthr      memory            page            disk          faults      cpu
 r b w   swap  free  re  mf pi po fr de sr s0 s1 s3 sd   in   sy   cs us sy id
 0 0 0 30301712 2887176 378 334 2658 1 1 0 0 4 4  0  0  331  392  260 11  2 87
 0 0 0 27608800 1589808 552 47 11535 0 0 0 0 0 0  0  0 1128 5540 1643 66  4 30
 0 0 0 27608784 1589616 1162 80 14855 0 0 0 0 0 0 0  0 1097 4885 1332 57  5 38
 0 0 0 27608816 1589440 718 45 16133 0 0 0 0 9 9  0  0 1183 3962 1330 63  5 32
 0 0 0 27608768 1589704 221 66 5230 2 2 0 0 0  0  0  0  641 5210 1028 41  2 57
:/# mpstat 5 5
CPU minf mjf xcal  intr ithr  csw icsw migr smtx  srw syscl  usr sys  wt idl
  0   74   6   37    82   77   87    7   14   18    0    63   20   3   4  74
  1   41   3   41    30   23    5    5   15   17    0    75    8   2   2  88
  2   26   2   79     7    1   42    4   14   15    0    77    7   1   1  90
  3   24   1   21    77   65    3    2   20   18    0    15    7   2   1  90
  8   82   3   88    68   59   41    7   20   22    0    31   14   5   2  79
  9   39   3   92    54   47   14    5   12   20    0    19   12   2   2  84
 10   23   2   10     7    1   60    3   10   18    0    22   10   1   1  88
 11   24   1   34     6    1    8    2   16   15    0    90    8   1   1  90

I don't know if it will show anything on Solaris 9, but check this:

echo "::memstat" | mdb -k

Bartus11, here is the output

:/# echo "::memstat" | mdb -k
Page Summary                Pages                MB  %Tot
------------     ----------------  ----------------  ----
Kernel                     573081              4477   29%
Anon                       709639              5544   35%
Exec and libs               17818               139    1%
Page cache                 536933              4194   27%
Free (cachelist)           170492              1331    8%
Free (freelist)               329                 2    0%
Total                     2008292             15689

As you can see in the output 4GB of physical memory are allocated to page cache. You don't have to worry about your applications running out of memory. If available RAM runs short, Solaris will give back some of that page cache for use for applications.

1 Like

Thanks Bartus.
But can we know, what is consuming so much memory ? Though Solaris will give page cache, but still Big Brother keeps complaining about high physical memory usage and once it crosses threshold value of 90%, it keep generating ticket.
Regards

I don't know about Big Brother monitoring, but can't you modify the threshold a bit?

hmm, I can.
But still client would like to know, what processes are consuming/holding so much memory, if we need to change threshold.

As I've already said, big part (25%) is held by system for page cache. Also kernel takes quite a big chunk (29%). To check applications' memory consumption you can use prstat -s rss

Got it. But ideally, so big percentage should NOT be hold by page cache as well as kernel.
Without reboot, can it be released to normal stats ?

I've seen way higher page cache percentages :wink: And the server didn't mind. System doesn't want to waste unused resources. So those stats ARE normal. Maybe except for the kernel size, but for now I can't access any Solaris 9 system for comparison.

Your stats are normal. Unused memory is wasted memory. If your Big Brother report settings confuse free memory (cache) and used memory, that's the piece to fix.

I understood. Thanks Jilliagre and bartus11 for your comments.
I will change threshold value in BB.

Hi,

Can you check for the second field running this command?

ps -eo pid,pmem,user,args | grep -v "PID" | sort -nr -k 2

This should give you what process is eating memory (shows in percentage). Is that what you are looking for?

Good command, it shows me amount of memory eating up by processes. But it seems applications are not culprit

/# ps -eo pid,pmem,user,args | grep -v "PID" | sort -nr -k 2
13595 12.4  autopp1 /t3/data/autonomy/IDOLServer/IDOL-t3ext3/content/content.exe -idolcomponent -co
13624 10.0  autopp1 /t3/data/autonomy/IDOLServer/IDOL-t3ext4/content/content.exe -idolcomponent -co
22401  9.8  autopp1 /t3/data/autonomy/IDOLServer/IDOL-t3ext1/content/content.exe -idolcomponent -co
23749  8.4  autopp1 /t3/data/autonomy/IDOLServer/IDOL-t3ext2/content/content.exe -idolcomponent -co
 9394  1.8  autopp1 /t3/data/autonomy/IDOLServer/IDOL-t3ext5/content/content.exe -idolcomponent -co
23751  0.8  autopp1 /t3/data/autonomy/IDOLServer/IDOL-t3ext2/agentstore/agentstore.exe -idolcompone
22403  0.8  autopp1 /t3/data/autonomy/IDOLServer/IDOL-t3ext1/agentstore/agentstore.exe -idolcompone
13626  0.8  autopp1 /t3/data/autonomy/IDOLServer/IDOL-t3ext4/agentstore/agentstore.exe -idolcompone
13597  0.8  autopp1 /t3/data/autonomy/IDOLServer/IDOL-t3ext3/agentstore/agentstore.exe -idolcompone
17958  0.6     root /opt/galaxy/Base/cvd
:/# echo "::memstat" | mdb -k
Page Summary                Pages                MB  %Tot
------------     ----------------  ----------------  ----
Kernel                     563470              4402   28%
Anon                       721123              5633   36%
Exec and libs               17858               139    1%
Page cache                 552821              4318   28%
Free (cachelist)           152220              1189    8%
Free (freelist)               800                 6    0%
Total                     2008292             15689

As bartus11 and jlliagre said, just page cache is high, which is good for server in a way. I just need to think, how to convince Big Brother :slight_smile: as it is still saying 92% of memory is consumed.

lol :smiley: