TCP connection check

Hi.
I am writing client - server application using TCP sockets.
I need some very basic functionality, namely: how to check if another "participant" of the connection is still present?
I want to handle situations, when client is gone, or server breaks down, etc.

You must first be prepared that the server close unexpectedly the connection (e.g. due to a crash), that is:

  • read returns -1 and errno is set (e.g. ECONNRESET).
  • write cause SIGPIPE to be sent to your process. The classical way to handle this is to ignore SIGPIPE, in which case write() shall fail and set errno to EPIPE.

This is however usually not enough to cover all scenarios (e.g. when the connection get physically broken, well simulated by plugging the "internet cable" out). Depending on our context, you may need to have a heartbeat mechanism, that is a way to exchange message between client and server to verify the aliveness of the connection.

If applicable, you may want to check the keep alive feature offered by TCP (there are however some pitfalls), our roll your own heartbeat mechanism in the protocol.

HTH, Lo�c

I have already written a function which aim is to send some data (request) and then receive some data (response) from the server.

int send_request(int socket, request_s *request) {
    char *buffer;

    if ((buffer = (char *) malloc(MSG_SIZE)) == NULL) {
        ERR("malloc");
        return -1;
    }

    request_to_string(request, buffer);

    if (TEMP_FAILURE_RETRY(send(socket, buffer, MSG_SIZE, 0)) < 0) {
        free(buffer);
        ERR("send");
    }

    free(buffer);
    return 0;
}

/***
 * Communicates with server. First sends the request, then gets the response.
 * Returns:
 *             0    success
 *             -1    provided request is NULL
 *             -2    sending request failed
 *             -3    receiving response failed
 */
int communicate(int socket, request_s *request, response_s *response) {
    fd_set rfds;
    sigset_t mask, old_mask;
    char buffer[MSG_SIZE];

    if (request == NULL)
        return -1;
    else {
        if (send_request(socket, request) < 0) {
            printf("Sending request failed.\n");
            return -2;
        }
    }

    FD_ZERO(&rfds);
    FD_SET(socket, &rfds);
    sigemptyset(&mask);
    sigaddset(&mask, SIGINT);
    sigprocmask(SIG_BLOCK, &mask, &old_mask);

    if (pselect(socket + 1, &rfds, NULL, NULL, NULL, &old_mask) > 0) {
        if (FD_ISSET(socket, &rfds)) {
            if (TEMP_FAILURE_RETRY(recv(socket, (void *) buffer, MSG_SIZE, 0))
                    < 0)
                return -3;
            string_to_response(buffer, response);
            return 0;
        }
    }
    if (errno == EINTR) {
        request->type = MSG_EXIT_REQ;
        send_request(socket, request);
        TEMP_FAILURE_RETRY(close(socket));
        exit(EXIT_FAILURE);
    }
    return -4;
}

There is a very strange problem with it.
Even if server is down it behaves like nothing happened. send and recv don't return -1 and response does not changes.
Function communicate always returns 0.
Could you help me debugging it? I have no bloody idea what to do.

I'm suspicious of your TEMP_FAILURE_RETRY macro. What happens if you leave it out?

You should also check for <=0, not just <0, since 0 means the connection has closed too, albeit in an orderly way.

I have tried <=, does not help.
I have also tried removing TEMP_FAILURE_RETRY, nothing changes.
More strange is fact, that when I execute this first time on the client side (of course during server breakdown) it behaves like nothing happened. When I execute this second time (I have some menu which handles different operations, every operation uses communicate to exchange messages with server), I have noticed that while executing:

    if (TEMP_FAILURE_RETRY(send(socket, buffer, MSG_SIZE, 0)) <= 0) {
        free(buffer);
        ERR("send");
    }

it kills my program! Now I am totally confused.

I'm still suspicious of that macro. Macros complicated enough to be loops can't return values AFAIK. Can we see its contents?

You could also put fprintf(stderr, "debugging statements"); into your program. Print out return values and the like.

Does it tell you what signal killed it?

ERR() already does it.

#define ERR(source) (fprintf(stderr,"%s:%d\n",__FILE__,__LINE__),\
                     perror(source),kill(0,SIGKILL),\
                          exit(EXIT_FAILURE))

It is hard to print any value, because this if (... <= 0) is never satisfied, instructions in brackets are never executed. I have only checked how many bytes was sent. First execution: 150, second execution: program quits.

How to check what killed program?

Well what does it print?

I put some info printing to my code in order to examine where function execution breaks. Now it lookes like that:

/***
 * Sends request through socket
 * Returns:
 *             0    success
 *             -1    buffer memory allocation failed
 */
int send_request(int socket, request_s *request) {
    char *buffer;

    if ((buffer = (char *) malloc(MSG_SIZE)) == NULL) {
        ERR("malloc");
        return -1;
    }
    int i;
    request_to_string(request, buffer);
    printf("send_request: before send\n");
    if ((i = send(socket, buffer, MSG_SIZE, 0)) < 0) {
        free(buffer);
        ERR("send");
    }
    printf("send_request: bytes sent: %d\n",i);
    printf("send_request: after send\n");
    free(buffer);
    return 0;
}
int communicate(int socket, request_s *request, response_s *response) {
    fd_set rfds;
    sigset_t mask, old_mask;
    char buffer[MSG_SIZE];
    memset(&buffer, 65, MSG_SIZE * sizeof(char));
    printf("communicate: beginning\n");
    if (request == NULL)
        return -1;
    else {
        if (send_request(socket, request) < 0) {
            printf("Sending request failed.\n");
            return -2;
        }
    }
    printf("communicate: after sending request\n");
    FD_ZERO(&rfds);
    FD_SET(socket, &rfds);
    sigemptyset(&mask);
    sigaddset(&mask, SIGINT);
    sigprocmask(SIG_BLOCK, &mask, &old_mask);

    if (pselect(socket + 1, &rfds, NULL, NULL, NULL, &old_mask) > 0) {
        if (FD_ISSET(socket, &rfds)) {
            if (TEMP_FAILURE_RETRY(recv(socket, (void *) buffer, MSG_SIZE, 0))
                    < 0)
                return -3;
            string_to_response(buffer, response);
            printf("communicate: buffer: %s\n", buffer);
            return 0;
        }
    }
    if (errno == EINTR) {
        request->type = MSG_EXIT_REQ;
        send_request(socket, request);
        TEMP_FAILURE_RETRY(close(socket));
        exit(EXIT_FAILURE);
    }
    return -4;
}

Here is also console output from client, when server is broken:

:~$ 5
send_scores_request: beginning
communicate: beginning
send_request: before send
send_request: bytes sent: 150
send_request: after send
communicate: after sending request
communicate: buffer: AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
send_scores_request: after communicate
Invalid response from the server (not MSG_SCORES_RSP)
1 - Show board
2 - Show my tiles
3 - Check whose turn it is
4 - Make a move
5 - Show scores

6 - Exit

:~$ 5
send_scores_request: beginning
communicate: beginning
send_request: before send

Remarks:

  1. Don't worry about those AAAAA... in buffer. I made memset(&buffer, 65...) at the beginning in order to better visualise if there something changes in the buffer, apparently not.
  2. As you can see, menu position 5 has been invoked twice. As I said before, first time send does not return any error, also recv neither returns any error nor changes the buffer. Second time program quits.

It is a very strange situation. For now I don't even know where to find a mistake :frowning:

How about just printing the send() error rather than killing yourself in ERR() ?

You haven't put an print statements in for recv() so we don't know how many bytes it received, if any.

Hmm... Perhaps you're off by one communication? As in, there's data sitting in the buffer that wasn't read last time? There's no guarantee you're going to get it all in one recv(), after all, you have to know how much is coming (or use a delimiter) to know when to stop.

What I was getting at is, we haven't seen the socket setup but I'm assuming they're non-blocking TCP, so the first send() might put out 150 bytes, then you come around and get an error but the ERR() macro has a kill(0,SIGKILL) so you kill yourself before you print it to the terminal.

Getting rid of ERR does not help, because this if ((send...) < 0) condition is never fulfilled. It looks like send() kills program. It sounds crazy.
I have put my whole code on github: https://github.com/chelmsford/Scrabble
Maybe you will find something what I can't see.

Try catching the SIGPIPE signal. You may be getting broken-pipe when you write to a broken socket.

Oh yeah, or set MSG_NOSIGNAL as last parameter to send() :wall:

Yes! That's it. The problem was that there appeared SIGPIPE during second communicate() invocation. It was killing the program (SIGPIPE default action is to terminate process).
But there is still one strange thing. On the client side, during server breakdown, first send_request does not return any error, only the second one (I have checked, it get SIGPIPE, so I can handle it properly). Why? What is about this first execution that it does not fail? Neither send nor recv get SIGPIPE.

TCP buffers, you know. You might not get all the data on first read() so there could be data left over from last time if you weren't careful to get it all. And it also buffers on the way out. It's not psychic either, so may not know the instant when the connection does.

Because send() just copies it to a local buffer to be sent out, it is not until the other end fails to recv() that the OS can detect a broken pipe.

I understand. So how to discover server breakdown on the first request sending?

You should make sure you're reading all the data in the first place. Just because you do one successful read() doesn't mean you got everything the server was sending. If you have no way of knowing how much the server was sending, you'll have to add that data to your protocol.

That way, next time you try and communicate with the server, there won't be old data lying in the buffer that it will mistake for a reply; it may succeed in sending data, not knowing the connection's broken yet, but won't get any reply, and time out instead.