Ps hanging on Solaris 10

HI,
I have Solaris 10 running on HP machine, x86. I am seeing some issue with this server. When I run "ps -ef", it just stops and do nothing after this (below) -

# ps -ef
     UID   PID  PPID   C    STIME TTY         TIME CMD
    root     0     0   0   Nov 04 ?           4:46 sched
    root     5     0   0   Nov 04 ?         383:55 zpool-rpool
    root     6     0   0   Nov 04 ?           1:23 kmem_task
    root     1     0   0   Nov 04 ?          20:30 /sbin/init
    root     2     0   0   Nov 04 ?         314:33 pageout
    root     3     0   0   Nov 04 ?        3551:43 fsflush
    root     7     0   0   Nov 04 ?           0:12 vmtasks
    root   410     1   0   Nov 04 ?         111:58 /usr/sbin/in.routed
    root    11     1   0   Nov 04 ?           3:05 /lib/svc/bin/svc.startd
    root    13     1   0   Nov 04 ?           5:13 /lib/svc/bin/svc.configd


I am not able to figure, what could be causing it or there is any process which is causing this and I can kill that.
I ran truss in below way (if this is correct way) -

#  /usr/bin/truss -aef -o /tmp/truss23.out ps -ef
     UID   PID  PPID   C    STIME TTY         TIME CMD
    root     0     0   0   Nov 04 ?           4:46 sched
    root     5     0   0   Nov 04 ?         383:58 zpool-rpool
    root     6     0   0   Nov 04 ?           1:23 kmem_task
    root     1     0   0   Nov 04 ?          20:30 /sbin/init
    root     2     0   0   Nov 04 ?         314:36 pageout
    root     3     0   0   Nov 04 ?        3552:11 fsflush
    root     7     0   0   Nov 04 ?           0:12 vmtasks
    root   410     1   0   Nov 04 ?         111:59 /usr/sbin/in.routed
    root    11     1   0   Nov 04 ?           3:05 /lib/svc/bin/svc.startd
    root    13     1   0   Nov 04 ?           5:13 /lib/svc/bin/svc.configd

After leaving it for around 5 minutes, it stays at this -

7136:   mmap(0x00000000, 4096, PROT_READ|PROT_WRITE|PROT_EXEC, MAP_PRIVATE|MAP_ANON, 4294967295, 0) = 0xFFFFFD7FFEC30000
7136:   memcntl(0xFFFFFD7FFEA30000, 11392, MC_ADVISE, MADV_WILLNEED, 0, 0) = 0
7136:   stat("/lib/64/libm.so.2", 0xFFFFFD7FFFDFE200)   = 0
7136:   resolvepath("/lib/64/libm.so.2", "/lib/amd64/libm.so.2", 1023) = 20
7136:   open("/lib/64/libm.so.2", O_RDONLY)             = 5
7136:   mmap(0x00010000, 32768, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_ALIGN, 5, 0) = 0xFFFFFD7FFEA20000
7136:   mmap(0x00010000, 499712, PROT_NONE, MAP_PRIVATE|MAP_NORESERVE|MAP_ANON|MAP_ALIGN, 4294967295, 0) = 0xFFFFFD7FFE9A0000
7136:   mmap(0xFFFFFD7FFE9A0000, 414061, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_FIXED|MAP_TEXT, 5, 0) = 0xFFFFFD7FFE9A0000
7136:   mmap(0xFFFFFD7FFEA15000, 18024, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_INITDATA, 5, 413696) = 0xFFFFFD7FFEA15000
7136:   munmap(0xFFFFFD7FFEA06000, 61440)               = 0
7136:   munmap(0xFFFFFD7FFEA20000, 32768)               = 0
7136:   close(5)                                        = 0
7136:   mmap(0x00000000, 4096, PROT_READ|PROT_WRITE|PROT_EXEC, MAP_PRIVATE|MAP_ANON, 4294967295, 0) = 0xFFFFFD7FFECA0000
7136:   mmap(0x00000000, 4096, PROT_READ|PROT_WRITE|PROT_EXEC, MAP_PRIVATE|MAP_ANON, 4294967295, 0) = 0xFFFFFD7FFEC10000
7136:   memcntl(0xFFFFFD7FFE9A0000, 61960, MC_ADVISE, MADV_WILLNEED, 0, 0) = 0
7136:   mmap(0x00000000, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANON, 4294967295, 0) = 0xFFFFFD7FFEA50000
7136:   sigaction(SIGPIPE, 0xFFFFFD7FFFDFEB40, 0x00000000) = 0
7136:   sigaction(SIGPIPE, 0xFFFFFD7FFFDFEB70, 0x00000000) = 0
7136:   getpid()                                        = 7136 [7135]
7136:   getuid()                                        = 0 [0]
7136:   getpid()                                        = 7136 [7135]
7136:   open("/proc/7136/psinfo", O_RDONLY)             = 5
7136:   read(5, "\0\0\00201\0\0\0E01B\0\0".., 416)      = 416
7136:   close(5)                                        = 0
7136:   getpid()                                        = 7136 [7135]
7136:   getpid()                                        = 7136 [7135]
7136:   open("/proc/7136/psinfo", O_RDONLY)             = 5
7136:   read(5, "\0\0\00201\0\0\0E01B\0\0".., 416)      = 416
7136:   close(5)                                        = 0
7136:   getpid()                                        = 7136 [7135]
7136:   sysinfo(SI_SRPC_DOMAIN, "ng911.state.ia.us", 256) = 18
7136:   open("/var/run/ldap_cache_door", O_RDONLY)      = 5
7136:   fcntl(5, F_SETFD, 0x00000001)                   = 0
7136:   door_info(5, 0xFFFFFD7FFEFFE900)                = 0
7136:   door_call(5, 0xFFFFFD7FFFDFE260) (sleeping...)

Is this giving me some idea, what it could be ?
There are 3 database zones running on this, so I am reluctant to reboot server unless I se no other option
Please advice.
Thanks

/var/run/ldap_cache_door seems to be a call to ldap.
I guess the ldap lookup of the user (process owner) is hung.
Check if the passwd and/or group lines in /etc/nsswitch.conf have ldap

The root user is found in files i.e. /etc/passwd. That works, and they are displayed.

Restart the ldap service!
There might be a further caching service running, e.g. nscd (name service caching daemon). Restart that one, too!

The ldap service (username mapping) certainly runs in each zone. Check in each zone!
In the global zone display the zone name with -Z

ps -Z -ef

or in this case omit the -f that enforces the hanging user name lookups

ps -Z -e

or a long output without the user names:

ps -eo zone,uid,pid,ppid,stime,tty,args
6 Likes

Wow, that did the job. Thank you.

# ps -Z -e | grep ldap
sms-dav- 17081 ?           0:00 ldap_cac
  global  6158 ?           7:28 ldap_cac
ngia-dav 17170 ?           0:00 ldap_cac
ngia-dav 17189 ?           0:00 ldap_cac
# kill -9 6158
# ps -Z -e | grep ldap
sms-dav- 17081 ?           0:00 ldap_cac
ngia-dav 17170 ?           0:00 ldap_cac
ngia-dav 17189 ?           0:00 ldap_cac
#

Then I restarted ldap and it worked fine. Thanks again, I was almost giving up on this

1 Like

This topic was automatically closed 10 days after the last reply. New replies are no longer allowed.