Parallel Execution of Programs

Since there've been a few requests for a method to execute commands on multiple CPUs (logical or physical), with various levels of shell-, make-, or Perl-based solutions, ranging from well-done to well-meant, and mostly specific to a certain problem, I've started to write a C-based solution (besides, I was bored). And here it is.

It's not complete by any means. It's not fail-safe, not even by far. It doesn't even have any documentation (yet). But it does what it's meant to do: run an arbitrary number of commands on and arbitrary number of CPUs.

How to install:
There's no installer yet. There's only a Makefile to build it, requiring GNU make. Just enter make (or gmake if needed), and everything should be compiled. The binary is called "parallel" until I come up with a better name.

How to run:
By default it will run 2 commands in at the same time. If you want to run more, specify the number on the command line, eg

./parallel 5

to run 5 commands.
The commands to run are read from stdin, separated by carriage returns and/or newlines, and ended by EOF. Commands will be passed to /bin/sh for execution, so a bit of shell syntax is possible, as long as it fits on one line.

An example might be in order. To compress all files in the current directory using gzip, and using 8 CPUs, enter

$ for file in *; do echo "gzip $file" ; done | /path/to/parallel 8
# OR
$ find . -type f -exec echo gzip {} \; | /path/to/parallel 8

Known problems:

  • Exit codes of individual commands are not propagated. As long as a child process could be created, everything's dandy.
  • Command input is not checked for a maximum length.
  • Absolutely no serious documentation.

Tested Platforms:
Linux 2.6 (OpenSuSE 10.3)
HP-UX 11.31
FreeBSD 8.0-Release
Cygwin

In theory it should run on any POSIX-compatible OS.

... and all the rest
If you find a bug & can correct it: please send me a patch (email is in the code)
If you want to write documentation: please send it to me, I'll include it.
If you find a good name for it: tell me.
It's under the 2-clause BSD license
Updates to the code will be posted here, unless there are a lot requests for some kind of repository.

Maybe I am missing something but all I see is is the classic fork/exec paradigm. Nothing to support running a command on multiple CPUs.

Whoops, meant "commands". Corrected that.

And yes, it's classical fork/exec. However, as some posters expressed a need for a program to do just that (eg run multiple gzips in parallel without a CPU idling) I thought I'd try to fill that need.

Hi. pludi.

I vaguely recall that xargs does something like that. The GNU/Linux version:

       --max-procs=max-procs
       -P max-procs
              Run up to max-procs processes at a time; the default is  1.   If
              max-procs  is 0, xargs will run as many processes as possible at
              a time.  Use the -n option with -P; otherwise chances  are  that
              only one exec will be done.

-- excerpt from man xargs

The GNU/Linux version is 4.4.0.

Is your code different from that? ... cheers, drl

Yes, ever tried to run GNU xargs on a non-Linux system? Last time I tried I had quite some dependencies to pull in too. Sometimes it's not available in binary form, so you'll have to compile it from source (including everything else in the findutils package), suddenly realizing that GNU code relies heavily on GNU libc. My code only needs a compiler (any standard C compiler will suffice) and a C library (again, any will suffice as long as POSIX is supported).

Also, in contrast to xargs, you're allowed to run more than 1 command in parallel. You could, for example, create gzip, bzip2, and LZMA compressed archives at the same time. Try that with xargs alone.