TCP connection check

Shang · June 18, 2011, 7:44am

Hi.
I am writing client - server application using TCP sockets.
I need some very basic functionality, namely: how to check if another "participant" of the connection is still present?
I want to handle situations, when client is gone, or server breaks down, etc.

Loic_Domaigne · June 18, 2011, 9:17am

You must first be prepared that the server close unexpectedly the connection (e.g. due to a crash), that is:

read returns -1 and errno is set (e.g. ECONNRESET).
write cause SIGPIPE to be sent to your process. The classical way to handle this is to ignore SIGPIPE, in which case write() shall fail and set errno to EPIPE.

This is however usually not enough to cover all scenarios (e.g. when the connection get physically broken, well simulated by plugging the "internet cable" out). Depending on our context, you may need to have a heartbeat mechanism, that is a way to exchange message between client and server to verify the aliveness of the connection.

If applicable, you may want to check the keep alive feature offered by TCP (there are however some pitfalls), our roll your own heartbeat mechanism in the protocol.

HTH, Lo�c

Shang · June 22, 2011, 5:41pm

I have already written a function which aim is to send some data (request) and then receive some data (response) from the server.

int send_request(int socket, request_s *request) {
    char *buffer;

    if ((buffer = (char *) malloc(MSG_SIZE)) == NULL) {
        ERR("malloc");
        return -1;
    }

    request_to_string(request, buffer);

    if (TEMP_FAILURE_RETRY(send(socket, buffer, MSG_SIZE, 0)) < 0) {
        free(buffer);
        ERR("send");
    }

    free(buffer);
    return 0;
}

/***
 * Communicates with server. First sends the request, then gets the response.
 * Returns:
 *             0    success
 *             -1    provided request is NULL
 *             -2    sending request failed
 *             -3    receiving response failed
 */
int communicate(int socket, request_s *request, response_s *response) {
    fd_set rfds;
    sigset_t mask, old_mask;
    char buffer[MSG_SIZE];

    if (request == NULL)
        return -1;
    else {
        if (send_request(socket, request) < 0) {
            printf("Sending request failed.\n");
            return -2;
        }
    }

    FD_ZERO(&rfds);
    FD_SET(socket, &rfds);
    sigemptyset(&mask);
    sigaddset(&mask, SIGINT);
    sigprocmask(SIG_BLOCK, &mask, &old_mask);

    if (pselect(socket + 1, &rfds, NULL, NULL, NULL, &old_mask) > 0) {
        if (FD_ISSET(socket, &rfds)) {
            if (TEMP_FAILURE_RETRY(recv(socket, (void *) buffer, MSG_SIZE, 0))
                    < 0)
                return -3;
            string_to_response(buffer, response);
            return 0;
        }
    }
    if (errno == EINTR) {
        request->type = MSG_EXIT_REQ;
        send_request(socket, request);
        TEMP_FAILURE_RETRY(close(socket));
        exit(EXIT_FAILURE);
    }
    return -4;
}

There is a very strange problem with it.
Even if server is down it behaves like nothing happened. send and recv don't return -1 and response does not changes.
Function communicate always returns 0.
Could you help me debugging it? I have no bloody idea what to do.

Corona688 · June 22, 2011, 6:02pm

I'm suspicious of your TEMP_FAILURE_RETRY macro. What happens if you leave it out?

You should also check for <=0, not just <0, since 0 means the connection has closed too, albeit in an orderly way.

Shang · June 22, 2011, 6:16pm

I have tried <=, does not help.
I have also tried removing TEMP_FAILURE_RETRY, nothing changes.
More strange is fact, that when I execute this first time on the client side (of course during server breakdown) it behaves like nothing happened. When I execute this second time (I have some menu which handles different operations, every operation uses communicate to exchange messages with server), I have noticed that while executing:

    if (TEMP_FAILURE_RETRY(send(socket, buffer, MSG_SIZE, 0)) <= 0) {
        free(buffer);
        ERR("send");
    }

it kills my program! Now I am totally confused.

Corona688 · June 22, 2011, 6:23pm

I'm still suspicious of that macro. Macros complicated enough to be loops can't return values AFAIK. Can we see its contents?

You could also put fprintf(stderr, "debugging statements"); into your program. Print out return values and the like.

Does it tell you what signal killed it?

Shang · June 22, 2011, 6:41pm

ERR() already does it.

#define ERR(source) (fprintf(stderr,"%s:%d\n",__FILE__,__LINE__),\
                     perror(source),kill(0,SIGKILL),\
                          exit(EXIT_FAILURE))

It is hard to print any value, because this if (... <= 0) is never satisfied, instructions in brackets are never executed. I have only checked how many bytes was sent. First execution: 150, second execution: program quits.

How to check what killed program?

neutronscott · June 22, 2011, 6:53pm

Well what does it print?

Shang · June 23, 2011, 7:22am

I put some info printing to my code in order to examine where function execution breaks. Now it lookes like that:

/***
 * Sends request through socket
 * Returns:
 *             0    success
 *             -1    buffer memory allocation failed
 */
int send_request(int socket, request_s *request) {
    char *buffer;

    if ((buffer = (char *) malloc(MSG_SIZE)) == NULL) {
        ERR("malloc");
        return -1;
    }
    int i;
    request_to_string(request, buffer);
    printf("send_request: before send\n");
    if ((i = send(socket, buffer, MSG_SIZE, 0)) < 0) {
        free(buffer);
        ERR("send");
    }
    printf("send_request: bytes sent: %d\n",i);
    printf("send_request: after send\n");
    free(buffer);
    return 0;
}
int communicate(int socket, request_s *request, response_s *response) {
    fd_set rfds;
    sigset_t mask, old_mask;
    char buffer[MSG_SIZE];
    memset(&buffer, 65, MSG_SIZE * sizeof(char));
    printf("communicate: beginning\n");
    if (request == NULL)
        return -1;
    else {
        if (send_request(socket, request) < 0) {
            printf("Sending request failed.\n");
            return -2;
        }
    }
    printf("communicate: after sending request\n");
    FD_ZERO(&rfds);
    FD_SET(socket, &rfds);
    sigemptyset(&mask);
    sigaddset(&mask, SIGINT);
    sigprocmask(SIG_BLOCK, &mask, &old_mask);

    if (pselect(socket + 1, &rfds, NULL, NULL, NULL, &old_mask) > 0) {
        if (FD_ISSET(socket, &rfds)) {
            if (TEMP_FAILURE_RETRY(recv(socket, (void *) buffer, MSG_SIZE, 0))
                    < 0)
                return -3;
            string_to_response(buffer, response);
            printf("communicate: buffer: %s\n", buffer);
            return 0;
        }
    }
    if (errno == EINTR) {
        request->type = MSG_EXIT_REQ;
        send_request(socket, request);
        TEMP_FAILURE_RETRY(close(socket));
        exit(EXIT_FAILURE);
    }
    return -4;
}

Here is also console output from client, when server is broken:

:~$ 5
send_scores_request: beginning
communicate: beginning
send_request: before send
send_request: bytes sent: 150
send_request: after send
communicate: after sending request
communicate: buffer: AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
send_scores_request: after communicate
Invalid response from the server (not MSG_SCORES_RSP)
1 - Show board
2 - Show my tiles
3 - Check whose turn it is
4 - Make a move
5 - Show scores

6 - Exit

:~$ 5
send_scores_request: beginning
communicate: beginning
send_request: before send

Remarks:

Don't worry about those AAAAA... in buffer. I made memset(&buffer, 65...) at the beginning in order to better visualise if there something changes in the buffer, apparently not.
As you can see, menu position 5 has been invoked twice. As I said before, first time send does not return any error, also recv neither returns any error nor changes the buffer. Second time program quits.

It is a very strange situation. For now I don't even know where to find a mistake

neutronscott · June 23, 2011, 10:04am

How about just printing the send() error rather than killing yourself in ERR() ?

Corona688 · June 23, 2011, 11:33am

You haven't put an print statements in for recv() so we don't know how many bytes it received, if any.

Hmm... Perhaps you're off by one communication? As in, there's data sitting in the buffer that wasn't read last time? There's no guarantee you're going to get it all in one recv(), after all, you have to know how much is coming (or use a delimiter) to know when to stop.

neutronscott · June 23, 2011, 3:35pm

What I was getting at is, we haven't seen the socket setup but I'm assuming they're non-blocking TCP, so the first send() might put out 150 bytes, then you come around and get an error but the ERR() macro has a kill(0,SIGKILL) so you kill yourself before you print it to the terminal.

Shang · June 24, 2011, 9:40am

Getting rid of ERR does not help, because this if ((send...) < 0) condition is never fulfilled. It looks like send() kills program. It sounds crazy.
I have put my whole code on github: https://github.com/chelmsford/Scrabble
Maybe you will find something what I can't see.

Corona688 · June 24, 2011, 10:15am

Try catching the SIGPIPE signal. You may be getting broken-pipe when you write to a broken socket.

neutronscott · June 24, 2011, 11:08am

Oh yeah, or set MSG_NOSIGNAL as last parameter to send() :wall:

Shang · June 24, 2011, 11:18am

Yes! That's it. The problem was that there appeared SIGPIPE during second communicate() invocation. It was killing the program (SIGPIPE default action is to terminate process).
But there is still one strange thing. On the client side, during server breakdown, first send_request does not return any error, only the second one (I have checked, it get SIGPIPE, so I can handle it properly). Why? What is about this first execution that it does not fail? Neither send nor recv get SIGPIPE.

Corona688 · June 24, 2011, 11:30am

TCP buffers, you know. You might not get all the data on first read() so there could be data left over from last time if you weren't careful to get it all. And it also buffers on the way out. It's not psychic either, so may not know the instant when the connection does.

neutronscott · June 24, 2011, 11:31am

Because send() just copies it to a local buffer to be sent out, it is not until the other end fails to recv() that the OS can detect a broken pipe.

Shang · June 24, 2011, 11:39am

I understand. So how to discover server breakdown on the first request sending?

Corona688 · June 24, 2011, 12:10pm

You should make sure you're reading all the data in the first place. Just because you do one successful read() doesn't mean you got everything the server was sending. If you have no way of knowing how much the server was sending, you'll have to add that data to your protocol.

That way, next time you try and communicate with the server, there won't be old data lying in the buffer that it will mistake for a reply; it may succeed in sending data, not knowing the connection's broken yet, but won't get any reply, and time out instead.