Hi all,
I'm struggling with an algorithm i've attempted in C.
The idea of the algorithm is to take a given range of numbers, say 1..100, and divide it into n equal chunks, with the final chunk absorbing any remainders. This a really common algorithm, but i don't know it's name to look up a reference implentation.
Here's my attempt thus far:
- Identify a range the process should work on
- Partition that range into equal chunks, so that the chunks can be allocated to worker threads
- Fire each chunk out to a thread for processing
/* Upper and lower bounds of the partition, packed for easy passing into a thread */
struct bounds_s {
int upper;
int lower;
};
typedef struct bounds_s bounds_t;
...
bounds_t global_bounds = { 100, 1 }; /* This would normally come from command line opts */
...
/* This lives in a for loop.
curr_thr iterates between 0 and total_thr-1.
global_bounds is the entire partition this process will work on
thread_bounds is to be populated with the boundaries of this thread's subdivision of the partition */
per_thread_bounds( curr_thr, total_thr, &global_bounds, &thread_bounds );
Where per_thread_bounds is defined as:
/* Work out how many subparitions to partition the global domain into,
and what the bounds of this thread should be */
void per_thread_bounds( int curr_thr, int total_thr, bounds_t const *global_bounds, bounds_t *thread_bounds ) {
/* How big is the total range we're dealing with? Divide it by the total number of threads
to find out how big a partition to assign to each worker thread */
int partition_size;
partition_size = (int)((global_bounds->upper - global_bounds->lower)/total_thr)+1;
/* Assign a partition's worth of range in the first instance, and tweak for
special cases thereafter */
thread_bounds->lower = global_bounds->lower;
thread_bounds->upper = global_bounds->lower + ( partition_size - 1 );
/* Special case: This is not the first CPU (i.e. not the first partition of range to assign
and is not the last CPU -- so don't need to soak up the remainders to the end of the range */
if( curr_thr > 0 ) {
thread_bounds->lower += curr_thr * partition_size;
thread_bounds->upper += curr_thr * partition_size;
}
/* Special case: This is the last thread, should soak up the rest of the partition */
if( curr_thr == (total_thr-1) ) {
thread_bounds->upper = global_bounds->upper;
}
}
This works fine for n (or total_thr) < 11, for the given range 1..100. However when i choose 11 worker threads over the range 1..100, the numbers are divided up as per 10 worker threads, leaving a spare worker thread (#11) which is allocated a range 101..100!
Not what i intended
So i guess it's todo with my rounding, and generally poor algorithm for partitioning a range. How do i improve it?
Many thanks in advance,
-c