Solaris 10 VMs hang after about a week

Hi all,

We're been having issues with quite a few Solaris 10 VMs hanging after about a week of uptime. These VMs are running on VMware ESXi 4.1 U1 hosts and the issue does not occur on any specific host. We also running CentOS VMs and are not experiencing any issues with those VMs. The VMs that are experiencing this issue are running a few different patch levels. I've seen it occur on 147441-15, 144489-17, and 142910-17.

These VMs are running Tomcat (5.5.33 and some run 6.0.18) and PostgreSQL 9.0.3. The Tomcat apps use JDK version 1.6.0_26. When the hang occurs, all network services stop responding and the console echos what I type but does not respond or give me any prompt.

I booted one of the affected VMs with the -k parameter to enable the kernel debugger. When the hang occured, I followed the instructions at http : // x86: How to Force a Crash Dump and Reboot of <meta http-equiv="refresh" content="0;url='http://www.oracle.com/pls/topic/lookup?ctx=solaris11'"> the System (System Administration Guide: Basic Administration) to invoke a system dump.

I analyzed the dump with the Solaris Crash Analysis Tool (SCAT) and this was the output:

I'm thinking the hangs are memory related, based on the output from SCAT. These VMs have 2 GB of memory. Would a lack of memory cause Solaris to completely hang? Shouldn't it be reserving some for the kernel? There is no useful information in /var/adm/messages when the hang occurs.

Thanks for any help you can provide.

Derek

Hello, your assumption is potentially good one: freemem 1754 7184384 (6.85M)
Also, your system is doing a lot of swapping, which is not good for performance.

WARNING: needfree is 80 pages
WARNING: freemem_wait is 80 (threads)
WARNING: page_create() throttled (freemem < throttlefree)
WARNING: hard swapping (avefree < minfree)
NOTE: nscan is 44505
NOTE: push_list_size is 256

Can you provide output from:

SCAT> thread summary 
SCAT> dev busy
1 Like

Hi vmcore,

Here is the information you requested. Thanks for your help with this!

CAT(/var/crash/unknown/vmcore.3/10X)> thread summary
        reference clock = panic_lbolt: 0x4b7e438, panic_hrtime: 0x2d11bb0cecb6b
   27   threads ran since 1 second before current tick (19 user, 8 kernel)
  104   threads ran since 1 minute before current tick (91 user, 13 kernel)

   12   TS_RUN threads (7 user, 5 kernel)
    0   TS_STOPPED threads
    8   TS_FREE threads (0 user, 8 kernel)
   54*  !TS_LOAD (swapped) threads (54 user, 0 kernel)
    3*  !TS_LOAD (swapped) but TS_RUN threads (3 user, 0 kernel)

    4*  threads trying to get a mutex (3 user, 1 kernel)
          longest sleeping 7 minutes 15.56 seconds earlier
    0   threads trying to get an rwlock
  505   threads waiting for a condition variable (320 user, 185 kernel)
    1   threads sleeping on a semaphore (0 user, 1 kernel)
          longest sleeping 9 days 3 hours 53 minutes 23.76 seconds earlier
   53   threads sleeping on a user-level sobj (53 user, 0 kernel)
   39   threads sleeping on a shuttle (door) (39 user, 0 kernel)

    0   threads in biowait()
    1*  threads in zio_wait() (0 user, 1 kernel)

    9   threads in dispatch queues (4 user, 5 kernel)
    1*  interrupt threads running (0 user, 1 kernel)

  631   total threads in allthreads list (428 user, 203 kernel)
    1   thread_reapcnt
    1   lwp_reapcnt
  633   nthread

CAT(/var/crash/unknown/vmcore.3/10X)> dev busy

Scanning for busy devices:
No busy/hanging devices found
Scanning for threads in biowait:

   no threads in biowait() found.

Scanning for procs with aio:

Derek

It is for sure issue with amount of memory this system has.
Bottom line is lack of memory, but if you want to dig in further, few more outputs will help.

CAT> tlist findcall zio_wait
CAT> tlist sobj mutex
CAT> swapinfo
CAT> tlist findcall pageout

Reason for this last few outputs is to confirm or rule out following bug: 6898318: ZFS root system can hang swapping to zvol

1 Like

Thanks for your help! Here are the outputs of those commands:

CAT(/var/crash/unknown/vmcore.3/10X)> tlist findcall zio_wait
==== kernel thread: 0xfffffe80007f9c60  PID: 0 ====
cmd: sched
t_wchan: 0xffffffffb4959fc8  sobj: condition var (from zfs:zio_wait+0x53)  
t_procp: 0xfffffffffbc276e0(proc_sched)
  p_as: 0xfffffffffbc293c0(kas)
  zone: global
t_stk: 0xfffffe80007f9c60  sp: 0xfffffe80007f99e0  t_stkbase: 0xfffffe80007f2000
t_pri: 60(SYS)  pctcpu: 0.000000
t_lwp: 0x0  psrset: 0  last CPU: 0  
idle: 43557 ticks (7 minutes 15.57 seconds)
start: Sat Apr 21 16:11:11 2012
age: 792212 seconds (9 days 4 hours 3 minutes 32 seconds)
tstate: TS_SLEEP - awaiting an event
tflg:   T_TALLOCSTK - thread structure allocated from stk
tpflg:  none set
tsched: TS_LOAD - thread is in memory
        TS_DONT_SWAP - thread/LWP should not be swapped
pflag:  SSYS - system resident process

pc:      unix:_resume_from_idle+0xfb resume_return:  addq   $0x8,%rsp
startpc: zfs:txg_sync_thread+0x0:  pushq  %rbp

unix:_resume_from_idle+0xfb resume_return()
unix:swtch+0x135()
genunix:cv_wait+0x68()
zfs:zio_wait+0x53()
zfs:dsl_pool_sync+0xba()
zfs:spa_sync+0x2d7()
zfs:txg_sync_thread+0x1d2()
unix:thread_start+0x8()
-- end of kernel thread's stack --


   1 thread with that call found.

CAT(/var/crash/unknown/vmcore.3/10X)> tlist sobj mutex
  thread             pri pctcpu           idle   PID              wchan command
  0xfffffe800054ac60  99  0.044       7m15.56s     5 0xffffffff800a2250 zpool-rpool
  0xfffffe8000544c60  99  0.014       7m15.56s     5 0xffffffff800a2250 zpool-rpool
  0xfffffe8000532c60  99  0.062       7m15.56s     5 0xffffffff800a2250 zpool-rpool
  0xfffffe8000339c60  60  0.237       7m15.34s     3 0xffffffff800a2250 fsflush

   4 threads with that sobj found.

top mutex/rwlock owners:
count   thread
    4   0xfffffe8000538c60  state: run   wchan: 0x0                 sobj: undefined

CAT(/var/crash/unknown/vmcore.3/10X)> swapinfo
swap device: /dev/zvol/dsk/rpool/swap
  vp: 0xffffffff849849c0 (181(zfs),1)  si_soff: 0x1000  si_eoff: 0x80000000  si_allocs: 122
  flags: 0x0  pages: 524287 (1.99G)  free pages: 458362 (1.74G)
  map size: 65536  si_swapslots: 0xffffffff88b3a000
CAT(/var/crash/unknown/vmcore.3/10X)> tlist findcall pageout
==== kernel thread: 0xfffffe8000351c60  PID: 2  on CPU: 0 ====
cmd: pageout
t_procp: 0xffffffff802a3998(proc_pageout)
  p_as: 0xfffffffffbc293c0(kas)
  zone: global
t_stk: 0xfffffe8000351b70  sp: 0xfffffe80003519d0  t_stkbase: 0xfffffe800034d000
t_pri: 97(SYS)  t_tid: 2  pctcpu: 25.949932
t_lwp: 0xffffffff800b1080  lwp_regs: 0xfffffe8000351b70
  mstate: LMS_SYSTEM  ms_prev: LMS_SYSTEM
  ms_state_start: 17.263364083 seconds earlier
  ms_start: 9 days 4 hours 4 minutes 35.349767316 seconds earlier
psrset: 0  last CPU: 0  
idle: 28 ticks (0.28 seconds)
start: Sat Apr 21 16:11:14 2012
age: 792209 seconds (9 days 4 hours 3 minutes 29 seconds)
tstate: TS_ONPROC - thread is being run on a processor
tflg:   T_TALLOCSTK - thread structure allocated from stk
tpflg:  TP_MSACCT - collect micro-state accounting information
tsched: TS_LOAD - thread is in memory
        TS_DONT_SWAP - thread/LWP should not be swapped
        TS_SIGNALLED - thread was awakened by cv_signal()
pflag:  SSYS - system resident process
        SNOWAIT - children never become zombies

pc:      unix:_resume_from_idle+0xfb resume_return:  addq   $0x8,%rsp

unix:_resume_from_idle+0xfb resume_return()
genunix:pageout_scanner+0x26d()
unix:thread_start+0x8()
-- end of kernel thread's stack --

==== kernel thread: 0xfffffe8000333c60  PID: 2 ====
cmd: pageout
t_wchan: 0xffffffff82e2fb8e  sobj: condition var (from zfs:txg_wait_open+0x73)  
t_procp: 0xffffffff802a3998(proc_pageout)
  p_as: 0xfffffffffbc293c0(kas)
  zone: global
t_stk: 0xfffffe8000333b70  sp: 0xfffffe8000333790  t_stkbase: 0xfffffe800032f000
t_pri: 98(SYS)  t_tid: 1  pctcpu: 1.389232
t_lwp: 0xffffffff800b1e00  lwp_regs: 0xfffffe8000333b70
  mstate: LMS_SLEEP  ms_prev: LMS_SYSTEM
  ms_state_start: 7 minutes 31.321261393 seconds earlier
  ms_start: 9 days 4 hours 4 minutes 35.350472035 seconds earlier
psrset: 0  last CPU: 0  
idle: 43414 ticks (7 minutes 14.14 seconds)
start: Sat Apr 21 16:11:14 2012
age: 792209 seconds (9 days 4 hours 3 minutes 29 seconds)
tstate: TS_SLEEP - awaiting an event
tflg:   T_TALLOCSTK - thread structure allocated from stk
tpflg:  TP_MSACCT - collect micro-state accounting information
tsched: TS_LOAD - thread is in memory
        TS_DONT_SWAP - thread/LWP should not be swapped
pflag:  SSYS - system resident process
        SNOWAIT - children never become zombies

pc:      unix:_resume_from_idle+0xfb resume_return:  addq   $0x8,%rsp

unix:_resume_from_idle+0xfb resume_return()
unix:swtch+0x135()
genunix:cv_wait+0x68()
zfs:txg_wait_open+0x73()
zfs:dmu_tx_wait+0xc4()
zfs:dmu_tx_assign+0x38()
zfs:zvol_strategy+0x267()
genunix:bdev_strategy+0x54()
specfs:spec_startio+0x81()
specfs:spec_pageio+0x29()
genunix:fop_pageio+0x28()
genunix:swap_putapage+0x1ed()
genunix:swap_putpage+0x26c()
genunix:fop_putpage+0x28()
genunix:pageout+0x281()
unix:thread_start+0x8()
-- end of kernel thread's stack --


   2 threads with that call found.

This specific VM is on patch level 147441-15, so I would assume it is not affected by that bug, unless a regression was introduced.

Thanks,
Derek

---------- Post updated at 01:30 PM ---------- Previous update was at 12:53 PM ----------

I see that they also suggest setting the primarycache ZFS property for the swap dataset to metadata. I just checked and mine is already set to that :(.

Actually, you should try workaround for above CR, as it has not been fixed yet (in your kernel version).
Official:

There is no resolution to this issue at this time. The Solaris 10 official fix for CR# 6898318 "ZFS root system can hang swapping to zvol" went into patch 147440-04 and was later backed out due to CR# 7108029 "with 147440-04 installed on vxvm root system panics in swapify()". CR# 6898318 is now being tracked by CR# 7106883 "swap zvol preallocation does not work on Solaris 10" for resolution through a patch.

workaround:

  • Use a raw swap partition instead of swapping to a zvol. or
  • zfs set primarycache=metadata {poolname}/{swapvol}
    This workaround mitigates the issue from happening. To make sure it is properly applied,
    1. execute swap -l, to get the names of the zpools and volumes involved.
    2. The primarycache property can be viewed by:
    zfs get primarycache {poolname}/{swapvol}
    and changed by:
    zfs set primarycache=metadata {poolname}/{swapvol}

Also, this is only from experience, I would not run ZFS on a system with such limited resources. 2GB of RAM is simply not enough for performance reasons.

1 Like