Socket Keep-Alive

Hi

I'm adding http 1.1 GET to my project and trying to use �Keep-Alive� HTTP connections to the host, The problem is when I recv() the first page, it succeeds. However, the 2nd consecutive recv() will receive zero bytes, for which I really have no idea. As per HTTP 1.1 I have Connection: �Keep-Alive� in the http header I send.

If anyone is familar with this would you please take a look at the code and explain what was wrong with it?

here is the extracted socket-related code in a small test program :wall:

#include <sys/types.h>
#include <sys/socket.h>
#include <netdb.h>
#include <netinet/in.h>
#include <unistd.h>
#include <string.h>
#include <errno.h>

#include <string>
#include <iostream>

using namespace std;

// ------------------------------------------------------
const short SOCKET_ERROR = -1;

const short RECV_BUFFER_SIZE = 20;
const short REQUEST_BUFFER_SIZE = 255;

int sock;
char recv_buf [RECV_BUFFER_SIZE + 1];
char req_buf [REQUEST_BUFFER_SIZE + 1];

const static char REQUEST_TEMPLATE [] = 
{
    "GET %s HTTP/1.1\r\n"
    "Host: xxx.xxx.xxx\r\n" // should be replaced with a really host
    "Connection: Keep-Alive\r\n"
    "\r\n"
};

// ------------------------------------------------------
void create_socket ();
void download (const string& path, string& response);

// ------------------------------------------------------

int main (void)
{
    string first_addr = "/~pdu/index.html"; // should be replaced with a really URL
    string second_addr = "/~pdu/a.html"; // should be replaced with a really URL
    string response;

    create_socket ();
    download (first_addr, response);
    cout << response << endl << endl;
    response = "";
    download (second_addr, response);
    if (response.size() > 0)
    {
        cout << response << endl;
    }
    else
    {
        cout << "### The 2nd recv() failed to receive any bytes from the socket!" << endl << endl;
    }
    
    close (sock);

    return 0;
}

void create_socket ()
{
    struct sockaddr_in addr;

    // ------------------------------------------------------
    sock = socket(AF_INET, SOCK_STREAM, IPPROTO_TCP);
    if(sock == SOCKET_ERROR)
    {
        perror ("Could not make a socket.\n");
        exit (-1);
    }

    cout << ">>> Socket created!" << endl;

    // ------------------------------------------------------
    struct hostent* host_info = gethostbyname("cse.unl.edu");    
    
    cout << ">>> DNS done!" << endl;

    long host_addr;

    /* copy address into long */
    memcpy(&host_addr, host_info->h_addr,
        host_info->h_length);

    /* fill address struct */
    addr.sin_addr.s_addr = host_addr;
    addr.sin_port = htons(80);
    addr.sin_family = AF_INET;

    // ------------------------------------------------------
    if( connect(sock, (struct sockaddr*)(&addr),
        sizeof(addr)) == SOCKET_ERROR )
    {
        perror("Could not connect to HTTP server.\n");
        exit (-1);
    }

    cout << ">>> Connection established!" << endl;
}

void download (const string& path, string& response) 
{
    size_t nBytes = snprintf(
        req_buf, 
        REQUEST_BUFFER_SIZE, 
        REQUEST_TEMPLATE, 
        path.c_str());

    if (nBytes >= REQUEST_BUFFER_SIZE)
    {
        cerr << "Buffer is too small for making a request message" << endl;
        exit (-1);
    }

    if (send(sock, req_buf, nBytes, 0) != nBytes)
    {
        perror("Could not send request to the HTTP server.\n");
        exit (-1);
    }
    cout << ">>> Request sent! -> " << path << endl << req_buf << endl;
    
    ssize_t size = 0;
    
    while ((size = recv(sock, recv_buf, 
        RECV_BUFFER_SIZE, 0)) > 0) 
    {
        recv_buf = '\0';
        response.append(recv_buf);
    }

    cout << ">>> Response received!" << endl;
}

Looking at it.

Add stdio.h and stdlib.h to your headers, btw.

1 Like

Something fishy is going on with the way the connection lags before delivering the content of the first page...

All right, this loop is suspect:

    while ((size = recv(sock, recv_buf,
        RECV_BUFFER_SIZE, 0)) > 0)
    {
        recv_buf = '\0';
        response.append(recv_buf);
    }

From man recv:

       When a stream socket peer has performed an orderly shutdown, the return
       value will be 0 (the traditional "end-of-file" return).

So, your program downloads the entire page, but doesn't stop there -- it calls recv() one more time when the transfer is done, which hangs until the web server becomes impatient and kicks you. Which is mecifully much less than the many minutes TCP usually defaults to. Then your program declares the download finished, writes another request to the dead socket, and "receives" another EOF in reply.

Reading until EOF might make sense without keepalives, but obviously won't do for persistent connections -- TCP/IP doesn't have an EOF signal, just an end of connection signal. This is why the web server must warn you of the content's length somehow -- as a content-length header, as chunked sections preprended with lengths in hexadecimal, etc.

1 Like

thanks for looking & solving. Yes I see now the recv loop should read a stream and work out when to stop pulling more bytes from recv(), In my full code I extract the chucked hex later for decompressing gzip, I'll have to read the header & chunk in the while(recv loop.

When i read about keep-alive I believe the Apache server is meant to wait for timout of at least 5 seconds (after handling a request), but my recv loop finishes pretty instantly after recieving 3-4 HTTP packets for the entire gzip webpage (in my full code).

I'm guessing somehow the server sees a request for more information and Apache has code to detect clients with while(recv() != 0).

It's also possible they've decreased the timeouts to help deal with congestion, I seem to get the entire five.

I do similar operations where the reply contains the size in the metadata in FIX data.
I use the following:

MSG_PEEK
This flag causes the receive operation to return data from the
beginning of the receive queue without removing that data from
the queue. Thus, a subsequent receive call will return the same
data.

Then you can get the content-length and work out how much to read, then use

MSG_WAITALL(since Linux 2.2)
This flag requests that the operation block until the full
request is satisfied. However, the call may still return less
data than requested if a signal is caught, an error or discon�
nect occurs, or the next data to be received is of a different
type than that returned.

1 Like