Writing a REST server to run on FreeBSD -- how to structure for concurrency?

LittleCookieMon · September 7, 2016, 9:58pm

Hi All,

I want to write a domain specific REST/HTTP server to run on FreeBSD. I have control over both the server side and the primary client intended to consume the service.

My question is: how do you think it should be designed to support multiple connections, given:

It will run on a modern x86 chip (I7 6700K) which has 4 cores, each running two threads.
REST transactions are short-lived, request/response, which are basically HTTP requests (GET, POST, DELETE).

Is fork() too costly for each connection?
Is kqueue the wrong tool, given that the connections are not really long-lived and do not consist of repeated updates/polling?
Should I use a thread approach where I have the master, and then create a new thread for each connection and then terminate the thread after the response is sent back?

I am really at a loss as to what the most effective approach to take is. While it may never be used, I'd at least like to design it to be efficient and capable of supporting as many concurrent connections as possible!

Thanks for any suggestions.

(I want to write it for multiple reasons: to learn, to challenge myself, to build something minimalistic, because the sun rises, etc. I realize I could just grab and go with an existing web server but I don't want to do that.)

unficyp · September 8, 2016, 4:17pm

fork is expensive.

Take a lool at the Erlang programming language and Cowboy or Yaws webserver.
The language is built for highly concurrent applications.

Corona688 · September 9, 2016, 4:44pm

Hyperthreading does not work that way. The eight "threads" are not program threads, just a side-effect of how the OS manages the four cores in hyperthreading mode. Four cores, four threads.

Since this is I/O and network related though, those threads may spend a lot of time waiting anyway though.

Yes.

They may not be long-lived, but HTTP still means lots of waiting for responses. You can have thousands of threads which spend 99% of their time waiting, or a few threads which spend most of their time working.

That's what makes aggregation routines like select and kqueue efficient. You can watch a set of things, waiting for any of them to become ready, without wasting time creating surplus threads which all sit around waiting.

Assigning one thread to multiple sockets can be efficient -- you can handle whichever one becomes ready out of a large set and handle it. One thread can do a lot of work in a short amount of time if you don't interrupt them by creating and killing them all the time. You can even do this in a single-threaded approach to make something surprisingly responsive.

That's almost the same problem as creating processes. Don't create a new anything for each connection.

If there was a perfect 'most effective' approach everyone would use it and throw out the rest. Approaches can and do vary.

My approach would be to have one thread which loops on accept() and puts the socket in a queue of some sort, for one of a fixed number of worker threads to grab it from and add it to the pile of sockets each worker thread is communicating with.

If you wanted to build something minimalistic, how about a single-threaded approach using select or kqueue to respond to connections and I/O as they become ready, rather than first-come-first-serve? People assume it's necessary to multithread to do anything but FCFS these days, but that's not strictly true, especially for lots of short-lived tasks. That's why a single thread can still be so effective at dealing with lots of small tasks -- given enough work, all the time it'd waste on waiting can actually get used.

LittleCookieMon · September 12, 2016, 6:25pm

Sorry for the delayed response.

A very big thank you to Corona688 and unficyp for the suggestions!

I'm still going to give it a shot; really busy right now so unable to implement it but I've bookmarked your advice, Corona688 and will definitely try some of the suggestions.

I think I may start with a two-thread approach: one listening, one processing (worker).

Might there be a benefit to using two threads over two processes in this case? (listener and worker) ? Or is it simply stylistically (choosing to use a shared shared heap & stack vs building inter process communication)

Corona688 · September 13, 2016, 11:31am

Shared memory can happen in IPC, too. What isn't shared between processes is file/socket handles. Theoretically, you can copy a file handle from one process to another, but the procedure for doing so without resorting to fork() makes me smell burnt toast... Threads of course don't have this problem, they share everything.

IPC is also a can of worms in some ways, their synchronization primitives have read/write bits just like files do, if you're not building your own private client/server model they can be more complicated than needed.

LittleCookieMon · September 18, 2016, 6:25pm

Corona688 -- an additional question for you:

As to your advice on "My approach would be to have one thread which loops on accept() and puts the socket in a queue of some sort, for one of a fixed number of worker threads to grab it from and add it to the pile of sockets each worker thread is communicating with."

Is there a recommended queue or messaging feature to accomplish this that I should research?

I see quite literally there are message queues (system V?), and the use of msgget() msgsnd() and msgrcv(). Is this particularly what you had in mind, and if so, is there any things to be mindful of to send socket descriptors through a message queue?

Thanks again!

---------- Post updated at 06:25 PM ---------- Previous update was at 04:42 PM ----------

Corona668 / others:

Is this a reasonable way?

Parent process starts worker processes
Listens on a socket for incoming connections and accept()s them

On accepting new connection:

Using UNIX Domain socket, using SCM_RIGHTS send descriptor to child worker process that is running. Worker process is decided by simple counter, e.g. i++ that resets once i == #worker processes. Very basic round-robin approach.

Would this suffice over the message queue approach?

Corona688 · September 19, 2016, 11:00am

sysvcipc is very high-overhead. I just meant building a queue out of memory and pthread mutexes, linux semaphores, or whatever kind of lower level IPC you had available.