High Page In and Executable page In

srage · November 13, 2008, 10:41pm

Hi,

Currently I'm experience very high page ins on my system running on solaris 10.

From vmstat, the page ins figure is very high, further drill down shows the page ins are from file system and occassional spike in executable page ins.

Details as follow:

oracle@perch:/files>> vmstat 5
 kthr      memory            page            disk          faults      cpu
 r b w   swap     free   re   mf     pi      po fr de sr m1 m1 m2 m2   in   sy   cs us sy id
 3 0 0 7268376 5110432 1041 2719 729590752414 753 753 0 0 6 0 6 0 7766 100603 12925 56 19 25
 5 0 0 11818608 9292016 451 918 13573290886586 3 3 0 0 2 0 1 0 10952 161811 14018 80 20 0
 2 0 0 11818432 9292960 385 792 13459633370179 2 2 0 0 1 0 2 0 10053 165300 13082 81 19 1
 5 0 0 11819720 9294472 322 623 12844506325972 0 0 0 0 1 0 1 0 11025 162384 14302 81 19 0
 3 0 0 11819400 9294664 468 1132 12864149119550 0 0 0 0 5 0 5 0 10555 163452 13193 78 21 1
 5 0 0 11818912 9295784 430 926 11488048016240 3 3 0 0 1 0 1 0 10935 158379 14086 80 20 0
 2 0 0 11819560 9296824 381 906 12037450017951 0 0 0 0 1 0 1 0 10671 165815 13422 81 19 0
 5 0 0 11819968 9298064 440 863 13209752927129 0 0 0 0 1 0 1 0 11219 158202 14557 80 20 0
 6 0 0 11819776 9297168 500 953 12490715053962 2 2 0 0 2 0 1 0 11777 157736 15521 80 20 0

oracle@perch:/files>> vmstat -p 5
     memory           page          executable      anonymous      filesystem
   swap  free  re  mf  fr  de  sr  epi  epo  epf  api  apo  apf  fpi  fpo  fpf
 7268296 5110360 1041 2719 753 0 0 11811407681 0 0  0    0    0 717270832025 753 753
 11820496 9292224 2055 2415 0 0 0    0    0    0    0    0    0 52087519922399 0 0
 11822032 9293648 2120 2163 5 0 0    0    0    0    0    0    0 55982485259240 5 5
 11824408 9296288 2414 2562 2 0 0    0    0    0    0    0    0 54414466418190 2 2
 11824328 9296576 2698 2719 0 0 0    0    0    0    0    0    0 52983811694559 0 0
 11822288 9293912 2332 1951 3 0 0    0    0    0    0    0    0 49271214261636 3 3
 11820448 9291312 2509 1115 0 0 0   18    0    0    0    0    0 48591187881736 0 0
 11817528 9294472 2480 871 2 0  0    0    0    0    0    0    0 54421309283685 2 2
 11820304 9304032 2429 860 8837 0 0 164187898623 0 0 0   0    0 47628172925439 8837 8837
 11814208 9413840 2067 860 29007 0 0 0    0    0    0    0    0 43272384005459 29009 29007
 11815744 9486424 2333 860 835 0 0   0    0    0    0    0    0 38585601084343 837 835

What area should we be looking into? Our swap space is being consumed steadily and reducing by the day. Not sure if it's because of runaway process.

Any opinion is appreciated.

Thanks in advance
ET.

Perderabo · November 16, 2008, 10:08am

So, for example, you have 13,209,752,927,129 page-in's in a 5 second period? That's a little bit more powerful than the systems I work with. Please tell us about your system. It's gotta be ccNUMA, but how many cpu's?

Well, since you have enough disks to supply 2.6 quadrillion pages in a second, you can afford some more swap space. So just add a few TB more swap.

srage · November 16, 2008, 7:47pm

OS: Solaris 10

oracle@perch:/u01/usr/lss/mdw>> prtdiag |more
System Configuration: Sun Microsystems sun4u Sun Fire 6800
System clock frequency: 150 MHz
Memory size: 16384 Megabytes

========================= CPUs ===============================================

CPU Run E$ CPU CPU
FRU Name ID MHz MB Impl. Mask
---------- ------- ---- ---- ------- ----
/N0/SB2/P0 8 1200 8.0 US-III+ 11.0
/N0/SB2/P1 9 1200 8.0 US-III+ 11.0
/N0/SB2/P2 10 1200 8.0 US-III+ 11.0
/N0/SB2/P3 11 1200 8.0 US-III+ 11.0
/N0/SB5/P0 20 1200 8.0 US-III+ 11.0
/N0/SB5/P1 21 1200 8.0 US-III+ 11.0
/N0/SB5/P2 22 1200 8.0 US-III+ 11.0
/N0/SB5/P3 23 1200 8.0 US-III+ 11.0

========================= Memory Configuration ===============================

Logical Logical Logical
Port Bank Bank Bank DIMM Interleave Interlea
ve
FRU Name ID Num Size Status Size Factor Segment
------------- ---- ---- ------ ----------- ------ ---------- --------
--
/N0/SB2/P0/B0 8 0 1024MB pass 512MB 8-way 0
/N0/SB2/P0/B0 8 2 1024MB pass 512MB 8-way 0
/N0/SB2/P1/B0 9 0 1024MB pass 512MB 8-way 0
/N0/SB2/P1/B0 9 2 1024MB pass 512MB 8-way 0
/N0/SB2/P2/B0 10 0 1024MB pass 512MB 8-way 0
/N0/SB2/P2/B0 10 2 1024MB pass 512MB 8-way 0
/N0/SB2/P3/B0 11 0 1024MB pass 512MB 8-way 0
/N0/SB2/P3/B0 11 2 1024MB pass 512MB 8-way 0
/N0/SB5/P0/B0 20 0 1024MB pass 512MB 8-way 1
/N0/SB5/P0/B0 20 2 1024MB pass 512MB 8-way 1
/N0/SB5/P1/B0 21 0 1024MB pass 512MB 8-way 1
/N0/SB5/P1/B0 21 2 1024MB pass 512MB 8-way 1
/N0/SB5/P2/B0 22 0 1024MB pass 512MB 8-way 1
/N0/SB5/P2/B0 22 2 1024MB pass 512MB 8-way 1
/N0/SB5/P3/B0 23 0 1024MB pass 512MB 8-way 1
/N0/SB5/P3/B0 23 2 1024MB pass 512MB 8-way 1

I'm trying to look at why the swap space is being consumed consistently, and it isn't released back to the system. At this rate, it's just a matter of time the swap goes out again. Our bomb just exploded just week, and bring our oracle instance down due to out of /tmp forcing a restart of the instance which releases the memory.

reborg · November 16, 2008, 8:09pm

Are you using ZFS?

Perderabo · November 16, 2008, 8:54pm

Actually I was trying to be facetious. Numbers like that make me think that vmstat must be broken. If I ignore the pi and assume that the rest of the numbers are valid, I don't really see any problem. You have swap and free physical memory. Page-outs are low and so is the scan rate. So pi is impossible and everything else looks good. I guess I would look for a vmstat patch though.

But if swap is disappearing, do a "df -k". /tmp uses swap for sure and I think maybe /var/run or something like that does as well. Could /tmp be eating your swap area? If it's a program, the size as reported by ps would be growing over time. Also I often see people over-allocate shared memory to Oracle. So when Oracle is running I do a "ipcs -mb" to look at that.

jlliagre · November 16, 2008, 9:22pm

Yes, these pi and fpi numbers can't be but bogus. 20000 TeraBytes per second is simply unrealistic.

Is your system up to date with patches ?

Especially one that fix a Veritas bug:

http://www.unix.com/sun-solaris/76511-huge-pi-vmstat.html\#post302224843

srage · November 17, 2008, 4:06am

yea, that's what I'm looking at currently too. Vxfs 4.1 bug.

@reborg, it's running on vxfs. which may trigger the the high pi bogus numbers.

I have another concern is the disappearing swap space. I can see clearly that the free swap space is dropping by the day, but unable to nail down the culprit. any suggestion how should I approach it?

thanks for the views all.

jlliagre · November 17, 2008, 4:44am

What are "df -k", "ps -efly" and "ipcs -mb" output ?

srage · November 17, 2008, 8:16pm

output as follow. As for ps -elfy output the listing is very long, any particular process like oracle's to grep for?

lss@perch:/u01/usr/lss/mdw/cronjob/vmstat_collection/log>> df -k
Filesystem kbytes used avail capacity Mounted on
/dev/md/dsk/d30 13112111 10190290 2790700 79% /
/devices 0 0 0 0% /devices
ctfs 0 0 0 0% /system/contract
proc 0 0 0 0% /proc
mnttab 0 0 0 0% /etc/mnttab
swap 12062784 1480 12061304 1% /etc/svc/volatile
objfs 0 0 0 0% /system/object
/platform/sun4u-us3/lib/libc_psr/libc_psr_hwcap1.so.1
13112111 10190290 2790700 79% /platform/sun4u-us3/lib/libc_psr.so.1
/platform/sun4u-us3/lib/sparcv9/libc_psr/libc_psr_hwcap1.so.1
13112111 10190290 2790700 79% /platform/sun4u-us3/lib/sparcv9/libc_psr.so.1
fd 0 0 0 0% /dev/fd
swap 12061728 424 12061304 1% /tmp
swap 12061360 56 12061304 1% /var/run
/dev/dsk/c3t22d16s0 70705920 15043321 52464963 23% /u01
/dev/dsk/c3t22d17s0 70705920 55450461 14302044 80% /u02
/dev/dsk/c3t22d18s0 70704000 50282980 19144766 73% /u03
/dev/dsk/c3t22d19s0 70704000 56003169 13807984 81% /u04
/dev/dsk/c3t22d20s0 70704000 1897944 64506381 3% /u05
/dev/dsk/c3t22d21s0 70704000 35279198 34866383 51% /u06
/dev/dsk/c3t22d22s0 70704000 42272342 26654680 62% /u07
/dev/dsk/c3t22d23s0 70704000 65826910 4572273 94% /u08
/dev/dsk/c3t22d24s0 70704000 56864838 12974215 82% /u09
/dev/dsk/c3t22d25s0 70704000 52346958 17209728 76% /u10
/dev/dsk/c3t22d26s0 70704000 58698070 11255624 84% /u11
/u06/files 70704000 35279198 34866383 51% /files

Perderabo · November 17, 2008, 8:52pm

You're looking for large processes. And you want to save the output. Run it each day. Compare all of the outputs and see if anything growing over time. You probably have a classic memory leak.