Loosing signals even with sigqueue

Hi everyone,

I have a process that forks many times.At a random time point the children must send a SIGUSR1 to the parent.To do that I'm using a pair of sigaction & sigqueue.However many signals are getting lost.
Here are some code segments:
a)sigaction

sigset_t mask_set;	/* used to set a signal masking set. */
	sigfillset(&mask_set);


	struct sigaction act;
	act.sa_sigaction=catch_usr1;
	act.sa_mask=mask_set;
	act.sa_flags=SA_SIGINFO | SA_RESTART;

	sigaction(SIGUSR1,&act,NULL);

b)signal handler

void catch_usr1(int sig,siginfo_t *a,void *b )
{
	sigcount++;
	if (readers>0)
		readers--;
	else
		writing=false;
	printf("[LOG] Signal received. Readers now: %d Signal counter: %d\n",readers,sigcount);
	fflush(stdout);
}

c)sigqueue

tmp=sigqueue(getppid(),SIGUSR1,vsig);

Can anybody find an error

You didn't post the code surrounding your sigqueue call. The call should fail with EAGAIN if the queue is currently full.

I just have this simple statements

tmp=sigqueue(getppid(),SIGUSR1,vsig);
close(handler.connfd);
printf("SERVER CHILD for %d: Out.Sigqueue returned %d\n",handler.id>0?handler.id:-handler.id,tmp);
fflush(stdout);
return 0;

I redirect my output to a file.The tmp variable has always the value 0.

It can fail with an error status, but isn't required to. This has caused some rarely-seen and bothersome bugs on a few platforms.

Excuse me? If SA_SIGINFO is set, either the signal must be successfully queued or sigqueue() must return an error. And without SA_SIGINFO, sigqueue must behave at least like kill() and deliver the signal if it is not pending. This behavior is required by the Posix standard:

Apparently the linux kernel maintainers were misinformed, then. An old design flaw in the linuxthreads system is that, if the signal queue overflows and it's never informed of it, causing thread termination signals to stop being delivered, resulting in something very odd -- zombie threads. They couldn't fix it as it was a design flaw rather than a bug, and the kernel maintainers insisted that error status was not required. nptl doesn't have the same issues fortunately.

So it isn't a fault of mine. I'm working already on a solution based on sockets using select.Should I reject signals as solution to my problem ? Reability of delivery is important to me.What is your opinion ?

It looks to me like your code should work. What version of Unix are you using? Have you searched for any kernel patches that might fix your kernel?

I do continue to have reservations about your use of sigqueue. Should your kernel start to work correctly, your program should then correctly start to document the loss of signals. You need to check for the queue being full. By putting printf's and fflush's in you signal handler you are greatly exacerbating any problems the parent will have in processing signals fast enough.

I putted printf & fflush statements for debugging purposes.The problem still exists without these statements..

I have a Linux 2.4.31 kernel, I'm using SlackWare 10.2 over VMware Workstation 5.5.I also have a telnet & ftp access to a SUSE LINUX Enterprise Server 9 (x86_64) - Kernel 2.6.5-7.276-smp.Problems are the same...

While experimenting I found that by simply using kill, fewer signals are lost.

Just for the heck of it, I would try SIGRTMIN rather than SIGUSR1, which should be easy to try. If it is failing on a 2.6 kernel with smp support, there is not much point in trying another kernel. Although there is a product called RTLinux which claims complete Posix Realtime compliance link.

Well here is something unexpected :confused: :

SIGRTMIN worked on the Linux 2.4.31 kernel of SlackWare 10.2
but
failed on the SUSE LINUX Enterprise Server 9 (x86_64) - Kernel 2.6.5-7.276-smp.

Well, SuSE is actually a supported product. Maybe you can ask Novell for a patch.

Thank you for your help.

I'm finaly droping signals.I have a solution using sockets that works with no problems on different machines.