Waiting for an arbitrary background process (limiting number of jobs running)

Hi,
I'm trying to write a script to decompress a directory full of files. The decompression commands can run in the background, so that many can run at once. But I want to limit the number running at any one time, so that I don't overload the machine.

Something like this:

n=0
for i in *.gz
do
    gzip -d $i &
    n=$((n+1))
    if [ $n -ge 10 ]; then
        # XXX Not sure what to do here
    fi
done

At the marked spot, I want to wait for one of my background processes to complete. I don't mind which one, but I do want to wait for just one.

wait doesn't work, as it waits for all jobs to complete. On the other hand, wait N doesn't work, because I don't know which job will finish first.

I could use trap "..." 20, but I'd need to be able to pause my script at the XXX line and be able to resume it via the "..." from the trap command. I can't think of a way of doing this ("suspend" in bash might work, but really I need this to work in ksh - I'm not sure the server this will ultimately run on has bash installed).

Can anyone suggest an approach that I could use?

Thanks,
Paul.

nope, I don't think you'll do it in ksh.
you'll need waitpid.
I would, get the list of files,
divide by the number of processes you want
and send that many files off via xargs
e.g. 10

set `ls *.gz` # sets $1 $2 $3 ... 
#  $# = the count

ls *.gz | xargs -n$(( $# / 10 )) gunzip

i don't know how set will react if you have hundreds of files
you might get 'command line too long'

Thanks, that's an approach I hadn't thought of. One thing it doesn't allow me to do is to report progress - something I'd thought of adding to my original approach was to add a "printf '.'" whenever I started a new decompress. But that's just a nice-to-have - your suggestion gets the job done.

I'd still be interested in any other possibilities that anyone can suggest - this is my first venture into anything more complicated than very basic scripts, and I'm learning a lot I didn't know!

Thanks,
Paul.

I think this is too much for a shell and using C or at least some real scripting language may be required here. However I'd love to see a solution for shell if possible.

I tried a perl solution and got really bogged down because I couldn't find an easy way of running a background command (disclaimer: it's a VERY long time since I used perl, but I don't have Python on the box I'm working with :-() Messing round with

sub spawn {
  $pid = fork;
  unless ($pid) {
    exec @_;
  }
  $pid;
}

seems fraught with potential issues that I don't understand (for a start, it doesn't handle shell metacharacters - should I use exec "sh", "-c", @_ or some similar incantation?)

If someone can confirm a decent Perl equivalent of the shell

some command possibly with metacharacters &

I'll see what I can do with the rest of it...

Paul.

 # XXX Not sure what to do here

How about you sleep(1) and check if you still have 10 gzip's running with pgrep(1) ? If its less than 10 then you can start few more.

---------- Post updated at 05:47 PM ---------- Previous update was at 05:45 PM ----------

With perl, you might want to use system() instead of exec().

My basic idea for solution would be to spawn initial N workes and save their pids to some table, then sleep 1 and see which of the PIDs are still alive. For those who are not - spawn next worker and save pid in place of the old one. Repeat until job is done.

The problem with counting with pgrep is that you will take in account the processess that may not be related with the script (any other user can run their own gzip, right?).

You could put the procs in background and use the shell var $! to save the pid. Not sure if it'll help or if it would be as easy to implement as it sounds. If not, then you could try perl or any other high level scripting languages.

I tried seeing what I could do with Perl, and I got this, which seems to work:

#!/usr/bin/perl
sub spawn {
    $pid = fork;
    unless ($pid) {
        exec "@_";
    }
    $pid
}
@pids = ();
for $i (1..20) {
    $cmd = "sleep $i; echo \$\$";
    $pid = spawn $cmd;
    print "$pid: '$cmd'\n";
    push @pids, $pid;
    if ($i > 10) {
        $pid = shift @pids;
        waitpid $pid, 0;
        printf "Process $pid ended\n";
    }
}
while (@pids) {
    $pid = shift @pids;
    waitpid $pid, 0;
    printf "Process $pid ended\n";
}

Does that look like a reasonable approach? It seems reasonably clean - although not as nice as my original ksh attempt (which had the disadvantage that it didn't work, of course :))

I'm not a perl expert, but you don't seem to loop through the all pids and check the values.

You should loop through all pids with non-blocking waitpid and it the process is not running - spawn a new one in place of the old one.

It seems to me that you are now waiting for the first pid in queue and when it finishes - sprawning another one. While quite good, the situation may be that from the first 10 pids the 2-9 have ended and 1 is working very long. You will end up with only one worker running for most of the time, which I guess you've tried to avoid.

Ah, thanks! Yes, you're right that is an issue.

I'll have a look at fixing this, I can probably also do it with a SIGCHLD handler I guess.

Paul.

I was wrong that shell can't do this. You can easily do this in Bash.

#!/bin/bash

N_WORKERS=10
WORK_COUNTER=15
C=$N_WORKERS
PIDS[1]=0
while [ $C != 0 ]; do
        sleep $(($RANDOM / 1000 + 4)) &
        PIDS[$C]=$!
        echo "Sprawned PID: ${PIDS[$C]}"
        WORK_COUNTER=$(($WORK_COUNTER - 1))
        C=$(($C - 1))
done

WORKER_I=$N_WORKERS
WORKERS_RUNNING=$N_WORKERS
while [ $WORKERS_RUNNING != 0 ]; do
        WORKER_I=$(($WORKER_I - 1))
        [ $WORKER_I ==  0 ] && WORKER_I=N_WORKERS
        [ ${PIDS[$WORKER_I]} == 0 ] && continue
        sleep 1

        if [ -z "`ps a | awk '{print $1}' | grep ${PIDS[$WORKER_I]}`" ]; then
                echo "Finished PID: ${PIDS[$WORKER_I]}"

                if [ $WORK_COUNTER != 0 ]; then
                        sleep $(($RANDOM / 1000 + 4)) &
                        PIDS[$WORKER_I]=$!
                        echo "Sprawned PID: ${PIDS[$WORKER_I]}"
                        WORK_COUNTER=$(($WORK_COUNTER - 1))
                else
                        PIDS[$WORKER_I]=0
                        WORKERS_RUNNING=$((WORKERS_RUNNING - 1))
                fi
        fi
done

I use random time sleeps to simulate different times of processing. Everything else should be quite self-explanatory.

Ah! It never occurred to me that I could use ps to check if workers were still running. That looks pretty good. And it looks like it will work just as well in ksh, too.

I'll give it a try. Thanks

---------- Post updated at 07:20 PM ---------- Previous update was at 04:06 PM ----------

I've now got a pretty neat solution in Perl (currently only tested on cygwin, but I see no reason it shouldn't work properly on "real" Unix). For those who might be interested, this is the final result:

#!/usr/bin/perl

use POSIX ":sys_wait_h";

%pids = ();
$npids = 0;
$MAX_CHILDREN = 10;

$SIG{CHLD} = \&REAPER;

sub REAPER {
    my $child;
    while (($child = waitpid(-1, WNOHANG)) > 0) {
        # print "$child ($pids{$child}) died!!!\n";
        delete $pids{$child};
        $npids--;
        # print "Child died ($npids still running)\n";
    }
    $SIG{CHLD} = \&REAPER;
}

sub launch {
    my ($cmd) = @_;
    if ($npids >= $MAX_CHILDREN) {
        # print "Zzzz...\n";
        sleep;
    }
    my $pid = fork;
    unless ($pid) {
        exec $cmd;
    }
    $pids{$pid} = $cmd;
    $npids++;
}

sub waitall {
    while ($npids) {
        sleep;
    }
}

for $i (<*.gz>) {
    $cmd = "gzip -d \"$i\"";
    print "Launching $cmd\n";
    launch $cmd;
}

waitall;

Thanks to all who helped me with this! It's been an interesting exercise, and I learned a lot along the way :slight_smile:

Paul.

if ($npids >= $MAX_CHILDREN) {
        # print "Zzzz...\n";
        sleep;
    }
    my $pid = fork;
    unless ($pid) {

I think you missed an 'else' there.

I'm not sure - sleep without an argument sleeps forever (until a signal comes in). So the intent of the statement is - if we have too many processes, pause. When SIGCHLD comes in, we are awakened, and control falls out of the if and (finally) starts the new process. If we don't have too many processes yet, just start a new process directly.

With an "else", wouldn't the case where we sleep then skip starting its process when it awakens?

Paul.

Why not take your original concept and build in a throttle, e.g.:

par=10
offset=$(ps|wc -l)
max=$(( par + offset ))
for i in *.gz
do
  # echo starting new process
  gunzip "$i" &
  while [ $(ps|wc -l) -ge $max ]
  do
     # echo "Throttling..."
     sleep 1
  done
done
wait