Machine dependent problems when using Sockets.

I am trying to write code for a client-server scenario using AF_INET sockets..

As is usually the case, everything works fine and dandy on my machine, but gives me the following error at runtime:

send: Socket operation on non-socket

The error is thrown by the server when trying to send the next plan of action the the client. Note that this is neither the first nor the first receive between the client and servers..many previous exchanges were successful, and the failure occurs consistently at this point.

The below snippet is within a select while(1) loop accepting incoming connections from clients. no_of_clients is a fixed parameter giving max cllient count.
argv[3] is the hop count a packet is to bounce through the network.

                        clientfds[client_count] = new_client;

                        if(client_count == no_of_clients)
                        {
                            //FINAL--
                        printf("Sending command\n");
                            if(atoi(argv[3])==0)
                            {
                                printf("Trace of packet:\n");
                                for(i=1; i<=no_of_clients; i++)
                                {
                                    len = send(clientfds, "Shutdown", 8, 0);
                                    if (len != 8)
                                    {
                                        fprintf(stderr,"Send sent partial string!\n");
                                        perror("send");
                                        exit(1);
                                    }
                                }

                                exit(0);
                            }
                            else
                            {
                                for(i=1; i<=no_of_clients; i++)
                                {
                                    len = send(clientfds, "Charge!!", 8, 0);
                                    if (len != 8)
                                    {
                                        fprintf(stderr,"Yes I Send sent partial string!\n");
                                        perror("send");
                                        exit(1);
                                    }
                                }
                            }
                            //FINAL--
                        printf("Command sent..waiting for listener ready setup\n");
                         

Relevant O/p portion..
...
Sending command
Yes I Send sent partial string!
send: Socket operation on non-socket
...
The only root cause common to this error from google(apart from semantic errors) was exceeding MTU size. The send() definitely is not exceeding any MTUs here as it is very small.

The code runs fine on a 64bit Ubuntu 11.04 install, but fails on a RHEL 5 64 bit machine

Any ideas guys?

---------- Post updated at 01:43 PM ---------- Previous update was at 04:24 AM ----------

Still haven't been able to resolve the issue.

It says 'socket operation on non-socket'. Somehow a non-socket got put in there...

You should print the FD of the socket you're sending to, and system("ls -l /proc/self/fd");

I suspect a buffer got overrun somewhere and the FD list corrupted with something unexpected. A buffer overrun would be very compiler dependent.

Corona : you were spot on. The FD array was getting corrupted.
The array was dynamically allocated.

However, I tried statically allocating the array and it worked like a charm. The interesting bit is, I reverted back to dynamic so that I could show you the corruption..But now the array doesn't get corrupted...Magically the corruption has disappeared. In such a scenario would you recommend me sticking with a dynamic allocation?

[Removed personal info]

Earlier FD 4 was corrupted and command was being sent to FD 0 instead.

It means that you still have a pointer or buffer overrunning somewhere. It is just not hitting the FD array at present.

Agreed. That's the trouble with debugging buffer overruns, occasionally they are harmless, but the instant you change anything, bang.

My initial corruption was due to lack of accounting for that 1 index. Anyways, even when I accounted for it, I was getting the corruption, so I concluded that this lack of index was not the root cause of the corruption.

Static allocation has worked around the problem for now.

Two other problems I am facing now:

1) What I am having problem understanding is, why getaddrinfo() is not able to resolve if I give it the entire hostname, whereas leaving out the domain name gives me an error stating service name or host not found.

2) Some combinations of hosts(on the same domain) are unable to interconnect.

The scenario that occurs is,
Initially master waits for incoming connections, and when all clients have connected, passes each client its right neighbours details and instructs the client 1 to go ahead and connect to its right neighbour and then so on.
However, what I am seeing is, on some machines wherein master is on one machine, and clients on another, the Player 1 receives a connect refused(even though i traced the code of the other client and ensured it was listening.
So Master is on Host1
Client 1 and Client 2 on Host 2.
Client 2 listens on port 5555.(assume)
Client 1 tries to connect to neighbour on host 2 and port 5555.
It gets a connect::connection refused.

Output of actual Run

Server Output
-------------
./master 65450 2 10
packet Master on bn19****
clients = 2
Hops = 10
client 1 is on host nom2684*****
client 2 is on host nom2684*****
<Hangs at this point>

Client 1 Output
---------------

./client bn19****
Connected as client 1
Trying to connect to host nom2684**** on port 34437
connect:: Connection refused
<exits>

Client 2 Output
---------------

Connected as client 2
I am client 2 and I am listening on port 34437
I sent my port 34437 to master
<Hangs>

1 Like