Hi All,
I have Solaris-9, Sun Fire V1280 with uptime of 501 Days. My Big brother monitoring showing me 90% + memory utilization on this box. Since this is production box, I can not reboot it. Is there way to know, what is consuming so much ? It is affecting my other environment on the box. Below are stats, which can help to understand my problem.
From prstat -a
NPROC USERNAME SIZE RSS MEMORY TIME CPU
34 autopp1 8577M 7133M 45% 175:31:28 19%
54 root 647M 317M 1.9% 8:00:28 3.2%
6 bb 6336K 5304K 0.0% 0:13:48 0.0%
1 smmsp 4520K 1304K 0.0% 0:00:03 0.0%
:/var/tmp# /usr/local/bin/top
last pid: 24465; load averages: 2.34, 2.55, 2.60 05:45:34
95 processes: 91 sleeping, 4 on cpu
CPU states: 43.7% idle, 40.8% user, 11.3% kernel, 4.2% iowait, 0.0% swap
Memory: 16G real, 1912M free, 6058M swap in use, 26G swap free
PID USERNAME LWP PRI NICE SIZE RES STATE TIME CPU COMMAND
17546 autopp1 15 0 0 1103M 1015M cpu/2 11:49 7.71% content.exe
13595 autopp1 15 0 0 1965M 1794M cpu/0 44.6H 7.24% content.exe
16162 autopp1 15 43 0 1309M 1194M cpu/8 15:21 6.46% content.exe
26233 root 3 60 0 53M 28M sleep 26:59 4.22% clBackup
13624 autopp1 15 59 0 1712M 1540M sleep 44.4H 4.16% content.exe
26239 root 1 58 0 45M 6536K sleep 4:00 1.05% nwp
:/var/tmp/prustat -t5 5 (Dtrace thing)
PID %CPU %Mem %Disk %Net COMM
13595 8.77 11.43 0.00 0.00 content.exe
16162 10.11 7.69 0.00 0.00 content.exe
17546 10.63 6.54 0.00 0.00 content.exe
13624 4.87 9.81 0.00 0.00 content.exe
26233 3.60 0.18 0.00 0.00 clBackup
PID %CPU %Mem %Disk %Net COMM
13595 8.76 11.43 0.00 0.00 content.exe
16162 10.05 7.69 0.00 0.00 content.exe
17546 10.61 6.54 0.00 0.00 content.exe
13624 4.84 9.81 0.00 0.00 content.exe
26233 3.61 0.18 0.00 0.00 clBackup
:/var/tmp# ps -ef | cut -c42-100 | sort -nr | head
24290:33 fsflush
4996:25 ./dih-t3ext.exe
2682:48 /t3/data/autonomy/IDOLServer/IDOL-t3ext3/content/co
2669:46 /t3/data/autonomy/IDOLServer/IDOL-t3ext4/content/co
283:57 /etc/init -
57:15 /t3/data/autonomy/IDOLServer/IDOL-t3ext5/content/cont
55:32 /opt/galaxy/Base/cvd
49:42 ./AutonomyIDOLServer-t3ext5.exe
29:31 /opt/galaxy/iDataAgent/clBackup -child 23666 -j 20676
24:53 ./AutonomyIDOLServer-t3ext3.exe
Though application team has restarted content.exe, still it showing big. From stats if I calculate, I do not see if it is consuming 90%
vbe
September 22, 2011, 10:17am
2
From top output for the Memory you see: Memory: 16G real, 1912M free... would look like almost 90%...
Since we dont know the configuration of your system it will be difficult to say more...
(e.g. is is /tmp swap? ...)
Thanks vbe for replying. Here is the configuration of my system
:/# prtdiag -v | head -20
System Configuration: Sun Microsystems sun4u Sun Fire V1280
System clock frequency: 150 MHZ
Memory size: 16384 Megabytes
======================================= CPUs =======================================
E$ CPU CPU Temperature
CPU Freq Size Implementation Mask Die Amb. Status Location
------- -------- ---------- ------------------- ----- ---- ---- ------ --------
0 1200 MHz 8MB SUNW,UltraSPARC-III+ 6.0 42C 24C online SB0/P0
1 1200 MHz 8MB SUNW,UltraSPARC-III+ 6.0 43C 24C online SB0/P1
2 1200 MHz 8MB SUNW,UltraSPARC-III+ 6.0 42C 23C online SB0/P2
3 1200 MHz 8MB SUNW,UltraSPARC-III+ 6.0 41C 24C online SB0/P3
8 1200 MHz 8MB SUNW,UltraSPARC-III+ 6.0 41C 24C online SB2/P0
9 1200 MHz 8MB SUNW,UltraSPARC-III+ 6.0 41C 25C online SB2/P1
10 1200 MHz 8MB SUNW,UltraSPARC-III+ 6.0 42C 24C online SB2/P2
11 1200 MHz 8MB SUNW,UltraSPARC-III+ 6.0 41C 24C online SB2/P3
==================================== IO Devices ====================================
Bus Freq Slot + Name +
Type MHz Status Path Model
:/# swap -l
swapfile dev swaplo blocks free
/dev/vx/dsk/bootdg/swapvol 273,96001 16 49160240 49152944
:/# df -h /tmp
Filesystem size used avail capacity Mounted on
swap 5.0G 8K 5.0G 1% /tmp
:/# vmstat 5 5
kthr memory page disk faults cpu
r b w swap free re mf pi po fr de sr s0 s1 s3 sd in sy cs us sy id
0 0 0 30301712 2887176 378 334 2658 1 1 0 0 4 4 0 0 331 392 260 11 2 87
0 0 0 27608800 1589808 552 47 11535 0 0 0 0 0 0 0 0 1128 5540 1643 66 4 30
0 0 0 27608784 1589616 1162 80 14855 0 0 0 0 0 0 0 0 1097 4885 1332 57 5 38
0 0 0 27608816 1589440 718 45 16133 0 0 0 0 9 9 0 0 1183 3962 1330 63 5 32
0 0 0 27608768 1589704 221 66 5230 2 2 0 0 0 0 0 0 641 5210 1028 41 2 57
:/# mpstat 5 5
CPU minf mjf xcal intr ithr csw icsw migr smtx srw syscl usr sys wt idl
0 74 6 37 82 77 87 7 14 18 0 63 20 3 4 74
1 41 3 41 30 23 5 5 15 17 0 75 8 2 2 88
2 26 2 79 7 1 42 4 14 15 0 77 7 1 1 90
3 24 1 21 77 65 3 2 20 18 0 15 7 2 1 90
8 82 3 88 68 59 41 7 20 22 0 31 14 5 2 79
9 39 3 92 54 47 14 5 12 20 0 19 12 2 2 84
10 23 2 10 7 1 60 3 10 18 0 22 10 1 1 88
11 24 1 34 6 1 8 2 16 15 0 90 8 1 1 90
I don't know if it will show anything on Solaris 9, but check this:
echo "::memstat" | mdb -k
Bartus11, here is the output
:/# echo "::memstat" | mdb -k
Page Summary Pages MB %Tot
------------ ---------------- ---------------- ----
Kernel 573081 4477 29%
Anon 709639 5544 35%
Exec and libs 17818 139 1%
Page cache 536933 4194 27%
Free (cachelist) 170492 1331 8%
Free (freelist) 329 2 0%
Total 2008292 15689
As you can see in the output 4GB of physical memory are allocated to page cache. You don't have to worry about your applications running out of memory. If available RAM runs short, Solaris will give back some of that page cache for use for applications.
1 Like
Thanks Bartus.
But can we know, what is consuming so much memory ? Though Solaris will give page cache, but still Big Brother keeps complaining about high physical memory usage and once it crosses threshold value of 90%, it keep generating ticket.
Regards
I don't know about Big Brother monitoring, but can't you modify the threshold a bit?
hmm, I can.
But still client would like to know, what processes are consuming/holding so much memory, if we need to change threshold.
As I've already said, big part (25%) is held by system for page cache. Also kernel takes quite a big chunk (29%). To check applications' memory consumption you can use prstat -s rss
Got it. But ideally, so big percentage should NOT be hold by page cache as well as kernel.
Without reboot, can it be released to normal stats ?
I've seen way higher page cache percentages And the server didn't mind. System doesn't want to waste unused resources. So those stats ARE normal. Maybe except for the kernel size, but for now I can't access any Solaris 9 system for comparison.
Your stats are normal. Unused memory is wasted memory. If your Big Brother report settings confuse free memory (cache) and used memory, that's the piece to fix.
I understood. Thanks Jilliagre and bartus11 for your comments.
I will change threshold value in BB.
Hi,
Can you check for the second field running this command?
ps -eo pid,pmem,user,args | grep -v "PID" | sort -nr -k 2
This should give you what process is eating memory (shows in percentage). Is that what you are looking for?
Good command, it shows me amount of memory eating up by processes. But it seems applications are not culprit
/# ps -eo pid,pmem,user,args | grep -v "PID" | sort -nr -k 2
13595 12.4 autopp1 /t3/data/autonomy/IDOLServer/IDOL-t3ext3/content/content.exe -idolcomponent -co
13624 10.0 autopp1 /t3/data/autonomy/IDOLServer/IDOL-t3ext4/content/content.exe -idolcomponent -co
22401 9.8 autopp1 /t3/data/autonomy/IDOLServer/IDOL-t3ext1/content/content.exe -idolcomponent -co
23749 8.4 autopp1 /t3/data/autonomy/IDOLServer/IDOL-t3ext2/content/content.exe -idolcomponent -co
9394 1.8 autopp1 /t3/data/autonomy/IDOLServer/IDOL-t3ext5/content/content.exe -idolcomponent -co
23751 0.8 autopp1 /t3/data/autonomy/IDOLServer/IDOL-t3ext2/agentstore/agentstore.exe -idolcompone
22403 0.8 autopp1 /t3/data/autonomy/IDOLServer/IDOL-t3ext1/agentstore/agentstore.exe -idolcompone
13626 0.8 autopp1 /t3/data/autonomy/IDOLServer/IDOL-t3ext4/agentstore/agentstore.exe -idolcompone
13597 0.8 autopp1 /t3/data/autonomy/IDOLServer/IDOL-t3ext3/agentstore/agentstore.exe -idolcompone
17958 0.6 root /opt/galaxy/Base/cvd
:/# echo "::memstat" | mdb -k
Page Summary Pages MB %Tot
------------ ---------------- ---------------- ----
Kernel 563470 4402 28%
Anon 721123 5633 36%
Exec and libs 17858 139 1%
Page cache 552821 4318 28%
Free (cachelist) 152220 1189 8%
Free (freelist) 800 6 0%
Total 2008292 15689
As bartus11 and jlliagre said, just page cache is high, which is good for server in a way. I just need to think, how to convince Big Brother as it is still saying 92% of memory is consumed.