Possible to link an IPC Sempahore to a process ID?

Solarius · July 2, 2008, 12:19am

Hi,

Anyone knows whether it is possible to link/relate an IPC semaphore to a particular process ID?

e.g.
# ipcs -as
IPC status from <running system> as of Wednesday July 2 14:10:39 EST 2008
T ID KEY MODE OWNER GROUP CREATOR CGROUP NSEMS OTIME CTIME
Semaphores:
s 0 0x71003323 --ra-ra-ra- root root root root 1 13:58:59 19:15:20
s 131073 0x49000027 --ra-ra-ra- root root root root 3 14:10:38 15:31:33
s 262146 0x45000ef8 --ra-ra-ra- root root root root 1 14:10:36 15:31:44
s 196611 0x450005c9 --ra-ra-ra- root root root root 1 8:03:07 15:32:53
s 262148 0xcace --ra-ra-ra- root root root root 1 14:10:36 15:33:19
s 65541 0xd20049a3 --ra-ra-ra- root root root root 1 23:30:06 23:28:01

The reason is that I'm trying to find out what process is spawning these semaphores IDs. The system has been set up to have 10 semaphores identifiers (Solaris 9 default) and after a week or 2 we always hit that limit. Instead of just adjusting /etc/system and changing the limit of semaphores ID's, I'd like to find out what is causing it. This box is used as a Tivoli Storage Manager server.

IPC information from sysdef -i :
*

IPC Semaphores
*
10 semaphore identifiers (SEMMNI)
60 semaphores in system (SEMMNS)
30 undo structures in system (SEMMNU)
25 max semaphores per id (SEMMSL)
10 max operations per semop call (SEMOPM)
10 max undo entries per process (SEMUME)
32767 semaphore maximum value (SEMVMX)
16384 adjust on exit max value (SEMAEM)

fpmurphy · July 2, 2008, 7:05am

The ipcs(1) utility may help you. man ipcs for details of the various options available.

ramen_noodle · July 2, 2008, 2:18pm

10 semid's is slim pickings. Why don't you cut your losses and set the max to something reasonable based on the system behavior, once it's ascertained that the system is working within reasonable bounds.

Solarius · July 3, 2008, 12:23am

Thx for your answers.

Ramen_noodle, the reason why I want to keep the semaphore ID's on 10 is 1.) because this is the Solaris 9 default and 2.) we actually don't experience this problem on our production system. The production system has exactly the same setup OS/hardware/software as this development system. So therefore it would be handy to find out which process is causing the grieve and it may lead us to something we can investigate.

ramen_noodle · July 3, 2008, 11:41am

That's great that you want to maintain continuity between current production and what I assume is dev, but in reality a codebase may
require a lot of tweaking in dev to conform. This is all guesswork of course.

If you aren't willing to bend the rules to develop a refined system in dev/test the result will probably be entirely different than what you are aiming for.
"Premature optimization is the root of all evil"

Solarius · July 7, 2008, 5:28am

I've changed /etc/system and set semaphore identifiers to 30. Currently I'm waiting on agreement for an outage window for a reboot.
I'll post the results.

Solarius · July 9, 2008, 8:34pm

After the reboot the system ran ok for a couple days, but again the SEMMNI (semaphore id's) limit was reached. We will have to take this issue to the software vendor and see if they have an idea.

Diabolist · July 10, 2008, 11:42am

You can map the shared memory IDs with pmap, but I don't believe you can map the semaphore to a process. You can with Solaris 10 and dtrace, but not on 9.

I was thinking lsof might help, but I just tried it out and it looks like a no-go.

Here's a quote from a Tivoli doc I did a quick lookup on:

For shared memory and semaphore usage make sure that the kernel parameters of your 
system have at least the following values (minimum requirements): 
     set semsys:seminfo_semmap=50
     set semsys:seminfo_semmni=50
     set semsys:seminfo_semmns=300
     set semsys:seminfo_semmnu=150
     set semsys:seminfo_semopm=50
     set semsys:seminfo_semume=50
     set semsys:seminfo_semmsl=125

And although this was listed for HPUX, and is really old, here's is how they recommend you calculate the needed value:

semaphores = 60 + (2 x maxSessions)
Where maxSessions is the maximum number of concurrent client sessions.

Solarius · August 14, 2008, 5:04am

Thanks for the info Diabolist, might come in handy.

I got relayed some information on what caused my problem... It seems it was because of a network problem causing the TSM sessions to not closing properly and constantly reopening a new session.

To bad I wasn't able to get a more detailed technical explanation.