Msgget(2) returns 0 - a workaround fix

Greetings:

I am posting this because my searches for this problem only came up with two posts and no helpful suggestions. I have a "solution" (read work-around hack) and have not tried yet to find a root cause, and may never because I am busy doing other things (read working to pay the bills).

However, I post this with two goals:

  1. For the poor shmuck at 3am
  2. document in case someone really has a wild hair (hare?) up their butt

Simply put, msgget(2) will return 0 for some reason, which the msgsnd(2) and msgrcv(2) do not like. My notes indicate msgsnd() was OK, and msgrcv() complained, but this was 12 hours into a debugging session....

There are two threads I have found in the interwebs:
forums.codeguru (dot) com/showthread.php?403036-strange-problem-in-using-msgget%28%29-in-Linux
and
unix (dot) com/programming/3755-about-msgget-troble.html

Both of these threads are "old" and closed, otherwise I would have responded to one of them.

NOTE: The codeguru.com has the best code example. The unix.com code has what may be a fatal flaw: it uses IPC_EXCL as part of the permissions - so the second time it is run it should complain, unless he first removed the message queue. However, he should have gotten errno == EEXIST and it appears he did not - he does print errno.

The Linux distro is Ubunto 8, not patched. Because the other posts are from 2006 and 2005, the CPU does not seem to be an issue.

The interesting thing is:
Running ipcs gives (in addition to various semaphores and shared memory):

 ------ Message Queues -------- 
key        msqid      owner      perms   used-bytes   messages 
0x000000f0 163840     gfi        666        0            0
0x0000007b 32769      gfi        666        0            0  

The original key was 0xF0 which returned 0x8000 when it was working. The hex for the decimal 163840 = 0x28000. I arbitrarily tried a key of 0x7B (well, decimal 123) and got a msgqid = 0x8001 (which == 32769 decimal).

I also see cases in my slime trail that when msgget() was returning non-zero, for a while it returned 0x10001. In all cases I am using an int to hold the msgQ_id. The key = 0xF0 returns 0, not 0x8000, so truncation is not an issue. I have not tried switching back to a key = 0xF0. I will try looking on another system running the same code (ie using 0xF0) to see what ipcs shows.

Another thing: 0 is supposed to be a legal return:

So - I don't know why msgget() will start returning 0. Honestly, I had another bug which (for a while) masked what msgsnd() was doing - a "(u)" instead of a "(%lu")" printf was throwing SIGSEGV (sigh) and I fixed both at the same time (ie new key) - this is a non-trivial system to run a code build on && one wants to do as much as one can between runs.

The only suggestion I can make is have the system come up with a unique key using ftok() every time, and remove old message queues. A good start on a key would be the parent process PID.

(please forgive the chopped links - apparently I am not yet blessed to give raw links yet :^)

We can't tell why your code is breaking down either. Certainly not without seeing it.

I would gently point out I provided a link to a post WITH the code Here it is. The only two changes from the codeguru example and mine is I changed the key. (Obiously, one should ALSO check for < 0 ... and I added that)

    msgid = msgget((key_t) 0xF0, 0666 | IPC_CREAT);

if( msgid < 0 ) {
printf(" Error in creating queue!!, errono = %d\n",errno);
exit( 0 );
}

    if (msgid == 0) {
      printf(" Got msgid == 0!!, errono = %d\n",errno);
      exit(0);
    }

Also, I was able to look on another development machine. It uses the same key as above (0xF0). ipcs reports the queue Id = 0x8000

Also - please note I am mainly posting this so that some poor programmer in the future with this problem can find this post.

The only folks who could provide any real answers, no offense, are the maintainers of msgget() and family. A link or contact point with them would be most appreciated.

1 Like

What OS?

And what's the value of errno after you get a zero back from msgget()? (And remember to set errno to zero before calling msgget()...)

msgid is NOT a message it is a message queue id. (A shared (IPC) memory object, not an individual message) A return of msgid ==0 means success. Any number >-1 == success.

Since you really did not post much code --
You should be calling msgrcv like this (note infinite loop is NOT required):

#include <stdio.h>
#include <stdlib.h>
#include <errno.h>
#include <sys/types.h>
#include <sys/ipc.h>
#include <sys/msg.h>

struct my_msgbuf 
{
 long mtype;
 char mtext[200];
};

int main(void)
{
   struct my_msgbuf buf;
   int msqid=0;
   key_t key;
   if ((key = ftok("[your key code goes here]", 'B')) =< -1)  /* same key as other program  edit:changed to =< */
   {
     perror("ftok");
     exit(1);
   }
   if ((msqid = msgget(key, 0666)) == -1)  /* connect to the queue */
   {
      perror("msgget");
      exit(1);
   }
   printf("Ready to receive messages\n");
   for(;;) 
   { 
      if (msgrcv(msqid, &buf, sizeof(buf.mtext), 0, 0) == -1)
      {
        perror("msgrcv");
        exit(1);
      }
      printf("%s\n", buf.mtext);
   }
   return 0;
}

The above snippet works correctly, I use it in other code....

1 Like

Read this:

strange problem in using msgget() in Linux

OP is stating that he's seeing the same problem as posted at codeguru years ago: when msgget() returns 0, the message queue doesn't work.

Hence my asking about errno values for when the msgget() returns 0, and when msgsnd()/msgrcv() fail with the zero message queue ID.

---------- Post updated at 03:48 PM ---------- Previous update was at 03:48 PM ----------

Read this:

strange problem in using msgget() in Linux

OP is stating that he's seeing the same problem as posted at codeguru years ago: when msgget() returns 0, the message queue doesn't work.

Hence my asking about errno values for when the msgget() returns 0, and when msgsnd()/msgrcv() fail with the zero message queue ID.

To be honest, I did not look at errno after msgget() returned 0 (It was in the wee early hours and I had bigger fish..). I will try that and get back with the results. This will also tell me if the original key returns 0.

The interesting thing is we loaded a *way* earlier version of the code to test a completely different thing. I jut did a ipcs and got

------ Message Queues --------
key        msqid      owner      perms      used-bytes   messages
0x000000f0 0          gfi        666        0            0

where you can see msgqid == 0 and the system seems to be performing happily (at least this part of it). This would seem to enforce the observation / theory that once the msgqid becomes 0, it stays 0. No complaints from msgsnd() or msgrcv().

So - this may be a brainfart on my part. After all, I did have another bug throwing a SIGSEGV at the same time. And it was very late/early. If it is a brainfart, my apologies.

Again, this an unpatched Ubuntu 8

@achenlehas an interesting suggestion. I had not thought of setting errno to a value before making the call. The question then becomes: why not set it to (-1)? errno values are positive, at least on Linux. (I seem to remember them being negative numbers on BSD 4.1, but that was a *long* time ago...)

All of the code in the links this thread have pointed to wrote error messages if msgget() returned a value <= 0 even though an error is indicated only if the return value is strictly less than 0.

Nothing was shown indicating that there was any error from msgrcv() or msgsnd() in cases where msgget() returned 0.

Unless the man page explicitly states otherwise, the value of errno after a call to a function that completes successfully is meaningless. On function calls where the value returned to indicate an error can also be returned in a successful completion case, you'll usually see something like:

There is no statement like this on the msgget() page because the value returned when msgget() fails ( -1 ) is never returned if msgget() succeeds.

1 Like

Less forcefully than Don steated it, but I think is idea of success return code is the entire problem here, plus using msgid as a message and NOT as message queue id. Which it is. They are not the same thing. Period.

I do not understand why you make this statement, I have consistently used msgget() to get the ID of the queue, never as a message itself. the msgq_id is a parameter to msgsnd() and msgrcv() to specify the queue for the message.

Now, my confusion was over a msgq_id == 0 as a valid number. It is, in both practice and theory.

@Don is absolutely correct stating

Again, my confusion was that what seemed to be a successful return from msgget() (according to the man page), but not having success. I actually had two bugs that were subtle and misleading, but that is the life of a programmer. (Especially when dealing with *really* bad code.) My searches for information found scant help. The collected wisdom of the community have helped me many times (Thanks, Google!) in finding the obscure information on forums. (If one programs Atmel chips, I recommend avrfreaks.net.)

AND - my purpose in this post was to try to help provide some more information to the next person who had this issue at 3am. The helpful replies have added to that goal, and I thank the posters for their time and attention.

I am not trying to flame, but *please* point out *one* place I used msgq_id as a message. You keep saying the same thing but will not give one example of where I did so.

In the meantime, I am going to see my family for a week, so I will check for a reply when I get back.