Basic multithreaded program

I'd like to write a program (I'm flexible on language; C/C++ was my original idea but a scripting language would probably be better) that runs hundreds of programs, but only N = 4 (say) at a time. The idea is to keep all the cores on a multicore machine busy.

How can I do this? In particular, I'd like a library call I can make in some appropriate language that can

  • Start a new thread
  • Start a command-line process in the thread with arbitrary arguments
  • Recognize when the process is complete, return information to the main thread and terminate the helper/slave thread
  • Ideally, send and receive information on standard in/out

This seems like a very basic thing to ask; I'm just looking for something that would make this simple. I'm probably going to write many programs like this for various tasks, and I thought it would be good to ask around before diving into something that's not quite appropriate. I started reading about the (new) Python threading earlier before I thought to ask for advice/help.

Could you give us some more information?

  • How do you get the list of programs to be run?
  • Are there any dependencies between the programs? If yes, how would you describe them?
  • What should happen to stdout/stderr? Output to the console or saved somewhere?
  • Do the programs require some kind of input once they're running?

What OS are you planning to use this program on? Can you control processor affinity on this OS?

The program will read in a text file with a bunch of numbers, do some processing on them, then create an array based on those numbers. Each element of the array will be passed through a function which will create an appropriate data set for that entry. The data will be used to create a temporary file and a command line argument using that file.

I didn't think this was relevant before so I didn't mention it. Basically, the program does some work and comes up with a list of commands to run.

Ah, that's a rather important question I forgot to address! The programs are independent. Dependencies make for complicated programs; mine is just a basic one.

I'd like it to be passed back to the program as a string, if possible. It should not be displayed.

This one I'm working on does not, but I'd like a method that could send input because other similar programs I'll write will probably need to do this.

-----Post Update-----

I just changed my OS to 64-bit Ubuntu 9.04. The programs are processor-intensive 64-bit programs with small-medium memory footprints.

So you've got a bunch of data, transpose it to another form & then process that further, right? If you've got access to the source for the last part of the processing, might it be possible to rewrite it using OpenMP (Wikipedia)? That way it'd be portable across different OS', processors and number of cores, and you'd eliminate the need for a central control program.

The individual programs aren't really parallelizable. (In CS jargon, they're conjectured to be P-complete, outside NC.) They're doing many sequential operations to a single piece of data. That's why I want to run them individually.

So instead I write a program to find the most efficient solution for each piece, calculate its likely runtime, solve an approximate bin-packing problem, and schedule each across a user-tunable number of processors.

OK, from what you've posted 'till now, the basic structure would be something like this (pseudo-code)

Read numbers from file
Process them
Create output files
Create array of commands to run
Total processes=0
While there are commands left
    pop a command from the stack
    fork() a subprocess
    In the child
        system() the command
    In the parent
        Total processes++
    If total processes >= 4
        wait() until any process returns

Input isn't really a problem here, since fork()ed processes inherit the parents file descriptors. Output is, since as far as I know it's hard to return data from the child to the parent without shared memory or something similar. You could save the output to a file in each child my changing stdout just before the system().
Alternatively, you could run exec() the program yourself, to reduce the fork-rate, but I'm not sure how well this would work.
Plus, with shed_setaffinity you can set the affinity of the processes (direct it which CPU to use), but you'd have to track which CPU the last process used.

Anyone with more experience with algorithms has a better idea?