Greetings:
I am posting this because my searches for this problem only came up with two posts and no helpful suggestions. I have a "solution" (read work-around hack) and have not tried yet to find a root cause, and may never because I am busy doing other things (read working to pay the bills).
However, I post this with two goals:
- For the poor shmuck at 3am
- document in case someone really has a wild hair (hare?) up their butt
Simply put, msgget(2) will return 0 for some reason, which the msgsnd(2) and msgrcv(2) do not like. My notes indicate msgsnd() was OK, and msgrcv() complained, but this was 12 hours into a debugging session....
There are two threads I have found in the interwebs:
forums.codeguru (dot) com/showthread.php?403036-strange-problem-in-using-msgget%28%29-in-Linux
and
unix (dot) com/programming/3755-about-msgget-troble.html
Both of these threads are "old" and closed, otherwise I would have responded to one of them.
NOTE: The codeguru.com has the best code example. The unix.com code has what may be a fatal flaw: it uses IPC_EXCL as part of the permissions - so the second time it is run it should complain, unless he first removed the message queue. However, he should have gotten errno == EEXIST and it appears he did not - he does print errno.
The Linux distro is Ubunto 8, not patched. Because the other posts are from 2006 and 2005, the CPU does not seem to be an issue.
The interesting thing is:
Running ipcs gives (in addition to various semaphores and shared memory):
------ Message Queues --------
key msqid owner perms used-bytes messages
0x000000f0 163840 gfi 666 0 0
0x0000007b 32769 gfi 666 0 0
The original key was 0xF0 which returned 0x8000 when it was working. The hex for the decimal 163840 = 0x28000. I arbitrarily tried a key of 0x7B (well, decimal 123) and got a msgqid = 0x8001 (which == 32769 decimal).
I also see cases in my slime trail that when msgget() was returning non-zero, for a while it returned 0x10001. In all cases I am using an int to hold the msgQ_id. The key = 0xF0 returns 0, not 0x8000, so truncation is not an issue. I have not tried switching back to a key = 0xF0. I will try looking on another system running the same code (ie using 0xF0) to see what ipcs shows.
Another thing: 0 is supposed to be a legal return:
So - I don't know why msgget() will start returning 0. Honestly, I had another bug which (for a while) masked what msgsnd() was doing - a "(u)" instead of a "(%lu")" printf was throwing SIGSEGV (sigh) and I fixed both at the same time (ie new key) - this is a non-trivial system to run a code build on && one wants to do as much as one can between runs.
The only suggestion I can make is have the system come up with a unique key using ftok() every time, and remove old message queues. A good start on a key would be the parent process PID.
(please forgive the chopped links - apparently I am not yet blessed to give raw links yet :^)