Algo to partition a range over workers in C

Hi all,

I'm struggling with an algorithm i've attempted in C.

The idea of the algorithm is to take a given range of numbers, say 1..100, and divide it into n equal chunks, with the final chunk absorbing any remainders. This a really common algorithm, but i don't know it's name to look up a reference implentation.

Here's my attempt thus far:

  1. Identify a range the process should work on
  2. Partition that range into equal chunks, so that the chunks can be allocated to worker threads
  3. Fire each chunk out to a thread for processing

/* Upper and lower bounds of the partition, packed for easy passing into a thread */
struct bounds_s {
  int upper;
  int lower;
};
typedef struct bounds_s bounds_t;

...

bounds_t global_bounds = { 100, 1 }; /* This would normally come from command line opts */

...

/* This lives in a for loop.
   curr_thr iterates between 0 and total_thr-1.
   global_bounds is the entire partition this process will work on
   thread_bounds is to be populated with the boundaries of this thread's subdivision of the partition */
per_thread_bounds( curr_thr, total_thr, &global_bounds, &thread_bounds );

Where per_thread_bounds is defined as:


/* Work out how many subparitions to partition the global domain into,
   and what the bounds of this thread should be */
void per_thread_bounds( int curr_thr, int total_thr, bounds_t const *global_bounds, bounds_t *thread_bounds ) {

  /* How big is the total range we're dealing with? Divide it by the total number of threads
     to find out how big a partition to assign to each worker thread */
  int partition_size;
  partition_size = (int)((global_bounds->upper - global_bounds->lower)/total_thr)+1;

  /* Assign a partition's worth of range in the first instance, and tweak for
     special cases thereafter */
  thread_bounds->lower = global_bounds->lower;
  thread_bounds->upper = global_bounds->lower + ( partition_size - 1 );

  /* Special case: This is not the first CPU (i.e. not the first partition of range to assign
     and is not the last CPU -- so don't need to soak up the remainders to the end of the range */
  if( curr_thr > 0 ) {
    thread_bounds->lower += curr_thr * partition_size;
    thread_bounds->upper += curr_thr * partition_size;
  }
  /* Special case: This is the last thread, should soak up the rest of the partition */
  if( curr_thr == (total_thr-1) ) {
    thread_bounds->upper = global_bounds->upper;
  }

}

This works fine for n (or total_thr) < 11, for the given range 1..100. However when i choose 11 worker threads over the range 1..100, the numbers are divided up as per 10 worker threads, leaving a spare worker thread (#11) which is allocated a range 101..100!

Not what i intended :slight_smile:

So i guess it's todo with my rounding, and generally poor algorithm for partitioning a range. How do i improve it?

Many thanks in advance,

-c

Ok so it was my rounding all along! :slight_smile:

Specifically it's this line which was iffy -

partition_size = (int)((global_bounds->upper - global_bounds->lower)/total_thr)+1;

By choosing to do the arithmetic in float's, then truncate to an int at the very last minute is the way to go -

partition_size = ((float)(global_bounds->upper - global_bounds->lower)/(float)total_cpus)+0.5;

The subtle ones are always the best! :slight_smile:

-c

You could also spawn off a number of threads and have each one pull the next item off the entire range. I.E. have a "next range item" variable and a mutex that protects it.

Each thread (in a loop until all values in the range are processed) locks the mutex, take the current value, increments it, unlock the mutex, and computes on the current value. This does create a hot spot at the mutex, but if the computation takes significantly long then I don't see that being a big deal.