Need help with counting within parallelized block - OpenMP/C-Programming

Hello together,

I have a question for you. Since a few weeks I am trying to programm
a small bruteforce application in C on Ubuntu 14.04.1 using Code::Blocks
with gcc-4.8.2. My programm is nothing special. It's just for practise,
Without hashing or something like that. For me the focus is on parallelize
the procedure that generates words. I'm using OpenMP for this small project.
Until now it works good for me. There is just one thing that I would like to
implement, but without success so far.

I want that my programm counts all the generated words, regardless of wether
the search string was found or not. I hope you understand what I'm talking
about. I'm sorry for my bad english. I am from germany.

I declared a global unsigned integer variable with the name 'Counter' for counting all
the generated words.
Because the whole procedure is parallelized, the variable is specified as reduction
variable.

#pragma omp parallel default... reduction ( +: Counter )

So the problem is, that the reduction is not completed until the end of the parallel region,
when the search string was found.
That means only when my programm doesn't have a match, at the end the number of counts for
all generated words is correct. Why? Because when nothing was found, the application reaches
the end of the parallel region and the reduction is completed.
But if my programm has a match, the number of counts is always wrong at the output.

Now, I would like to know, if there is way to exit sane and portable from a parallel region,
without have to wait until all possibilites of words has been tried? Do you understand
what I mean? I.e. we are searching the following string: "4788". I tell my programm the max.
length of words should be 4. From "0" until "9999" altogether are 11110 tries, using just
digits (0123456789) for the brute force procedure. If the word is found, the number of tries (counts)
is 5899.
Is it possible to exit sane and portable from a parallel region earlier?
I.e. when the search string was found? By the same behavior like if nothing was found? So that at the end
the number of counts will displayed correct, even if the word was found?

I have tried something with two variables 'CheckStatus' and 'MyCheckStatus'. This variables should indicate
when work is done (i.e. if there's a match).
With omp atomic read and omp atomic write. But unfortunately without success.

Here is my source code:

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <omp.h>

/* Global variables */

const char         Literals[]    = "0123456789";
const unsigned int NLiterals     = sizeof( Literals ) - 1;
const char         TestString[]  = "4788";
char               EndString[16];

int CheckStatus = 0;

/* Functions */

static void BruteForcer( const int *, const int *, int, unsigned * );

int main()
{
    unsigned Counter = 0;

    #pragma omp parallel default( none ) shared( CheckStatus ) reduction( +:Counter )
    {
        for( int WordLength = 0; WordLength < 5; ++WordLength )
        {
            int FirstEntry[WordLength], LastEntry[WordLength], MyCheckStatus;

            for( int j = 0; j < WordLength; ++j )
                FirstEntry[j] = LastEntry[j] = 0;

            #pragma omp for schedule( dynamic )
            for( int i = 0; i < NLiterals; ++i )
            {
                FirstEntry[0] = i;
                LastEntry[0]  = i + 1;
                #pragma omp atomic read
                MyCheckStatus = CheckStatus;
                if( !MyCheckStatus )
                    BruteForcer( FirstEntry, LastEntry, WordLength, &Counter );
            }
        }
    }

    if( CheckStatus )
        fprintf( stdout, "\nWord found: '%s'! %u tries.\n", EndString, Counter );

    else
        fprintf( stdout, "\nNothing found: '%s'! %u tries.\n", "Empty", Counter );

    exit( EXIT_SUCCESS );
}

static void BruteForcer( const int *FstEntry, const int *LstEntry, int WdLength, unsigned *CounterPtr )
{
    char Word[WdLength];
    int  Entry[WdLength + 1];
    int  i, j;

    memset( Entry, '\0', WdLength );

    /* copy FstEntry to Entry */
    for( i = 0; i < WdLength; ++i )
        Entry = FstEntry;

    i = 0;

    while( i < WdLength )
    {
        /* generate word */
        for( i = 0; i < WdLength; ++i )
            Word = Literals[Entry];

        /* null-byte at end of string */
        Word[WdLength] = '\0';

        ++*CounterPtr;

        /* string compare */
        if( strncmp( TestString, Word, sizeof( TestString ) ) == 0 )
        {
            strncpy( EndString, Word, strlen( Word ) );
            #pragma omp atomic write
            CheckStatus = 1;
            return;
        }

        /* increment Entry */
        for( i = 0; i < WdLength && ++Entry[WdLength-i-1] == NLiterals; i++ )
            Entry[WdLength-i-1] = 0;

        /* when Entry != LstEntry then leave loop */
        for( j = 0; j < WdLength; ++j )
            if( Entry[j] != LstEntry[j] )
                break;

        /* when Entry == LstEntry leave function */
        if( j == WdLength )
            return;
    }
}

Without parallelization I have this output:

Word found: '4788' ! 5899 tries!

That is correct. But with parallelization the output is as follows:

Word found: '4788' ! 10899 tries! (Each time the number is another!)

The goal is, that the number of counts is always correct with parallelization.
Please could you help me with this issue? If possible with small code examples or improving my source code, too. I would be very very appreciated.

DaveX

It took me a while to realize what you were getting at.

The count isn't wrong so to speak. If it launches 12 threads at once, the 1st of which finds your value, that doesn't mean the other 11 never ran.

If you want the exact context you found it, you should only set a value when you find it.

Hello,

thank you for your replying.

Yes, I know this. :slight_smile: But this is not the problem.
As I already wrote, the problem is that I would like to know, how to reach that even though the programm has a match, it shows the correct number of counts. In order that it works, I have to find a way the application exits the parallelized block once the word is found. But with the same behavior as it'd not find anything. Because how I already explained, the correct number of counts is displayed only if the end of the parallelized block was reached. Only thus the reduction can complete.

My point is, count isn't wrong as much as completely meaningless. The order your threads run in isn't absolute. Thread #5 could run before thread #1, succeed, and return a count of 1! If you want the threads to know what order they are, you'll have to tell them.

Increment outside the parallel section and feed that value into the function as a parameter. If the function succeeds, have it set a flag variable to the correct 'count' value. Other threads can check that flag variable before their loop, and quit early if nonzero.

I know. That's the reason why I specified 'Counter' as a reduction variable. But you're right. My implementation doesn't do what I expect. So I would try to implement your suggestions.

Could you show me how to do this? Maybe a small code example?

I will try to do this. Could you help me? Maybe I would understand it better if you could give a small example. If you want?

I don't understand this multithreading library in particular, just multithreading in general, so I can't give you OMP code, but I can show you the concept of what I mean:

int flag=-1;

while(flag < 0) {
        function(count, &flag);
        count++;
}

void parallel_function(int count, int *flag) {
        char str[10];
        int c=count;

        if( (*flag) >= 0) return;

        memset(str, 0, sizeof(str));

        // Compute the string from the value of 'c'.  Each different
        // count will give a unique solution.
        int length=(c%5)+1;
        c /= 5;

        for(int n=0; n<length; n++)
        {
                str[n]=Literals[c%Nliterals];
                c /= Nliterals;
        }

        // If and only if we have the correct solution, store the value.
        if(string_matches)
                (*flag)=count;
}

You can run that in parallel and not have 5000 "wrong" threads upsetting the value of 'flag', since it only gets set on success. And order is irrelevant since each thread is told exactly what its number is.

The size of an integer limits the number of unique digit strings you can create, but that was also true before. (i.e. a 30-digit string has 10^30 possibilities, an integer only holds up to 2^31) You could use more complicated things than a single int -- like an array of integers, one for each digit -- but the principle remains the same.

1 Like

Cool. :slight_smile: Thank you very much. :b: Now I know what you mean. I will try to implement it in my source code. That is a really good idea. Thanks a lot. Later I will post my modified source.

I should also note that for a problem this small, a single-threaded loop over 100 strings may be just as fast or faster than launching 100 tiny short-lived threads. There is overhead to starting and quitting a parallel operation. So, having a loop inside the thread so each thread does more work before it quits could be quite efficient. Give each thread a starting point and let it check 100 perhaps.

Yes, I have seen a control in one lib called 'eureka', so the first thread to find a solution stops all other threads. The trick is to find a signal or interrupt to alert them, so all threads are not polling some variable to decide to continue.

Okay. :slight_smile: Thank you for the information.

I've tried to parallelize your code. I must say, that I like it really much.

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <omp.h>

const char         Literals[]    = "0123456789";
const unsigned int NLiterals     = sizeof( Literals ) - 1;
const char         TestString[]  = "4877";
char               EndString[16];

void parallel_function( int, int * );

int main()
{
    int flag = -1;
    int count = 0;

    #pragma omp parallel default( none ) shared( flag ) reduction( +:count )
    {
        while( flag < 0 )
        {
            parallel_function( count, &flag );
            count++;
        }
    }

    if( !flag )
        printf( "Nothing found!!! %i tries\n", count );

    else
        printf( "Match: %s ! Tries: %i\n",  EndString, count );

    return 0;
}

void parallel_function( int count, int *flag )
{
    char str[10];
    int  c = count;

    if( ( *flag ) >= 0 )
        return;

    memset( str, 0, sizeof( str ) );

    // Compute the string from the value of 'c'. Each different
    // count will give a unique solution.
    int length = ( c % 5 ) + 1;
    c /= 5;

    for( int n = 0; n < length; n++ )
    {
        str[n] = Literals[c%NLiterals];
        c /= NLiterals;
    }

    // If and only if we have the correct solution, store the value.
    if( strncmp( TestString, str, sizeof( TestString ) ) == 0 )
    {
        strncpy( EndString, str, strlen( str ) );
        ( *flag ) = count;
    }
}

But unfortunately I have still the same behavior with the counting of the generated words. I think I have to learn more about parallelizing with OpenMP. Otherwise I'll never succeed with my small project. It's really difficult.

@Corona688: I thank you very very much for your help and for your code example. :):b:

---------- Post updated at 10:44 PM ---------- Previous update was at 10:42 PM ----------

Hello :). Thank you for this information. Maybe this could be the solution. Okay, I have seen 'eureka' is as of OpenMP version 4.0.

Why are you printing count? Why do you even have count? Print flag! That's what it's there for.

Finally, it's unavoidable when doing a parallel search that you may run more threads than strictly necessary. If you ran 4 simultaneous threads but found it on the very first try, that's tough luck, you ran 4 anyway.

1 Like

Yes. :):):slight_smile: Sorry, after I read your post I changed it and it works!!! Oh man, this is really amazing. I don't know how to thank you. The number of counts is absolutely correct. :b:
I can't believe it. Since weeks I'm searching for a solution. And you gave it to me. :smiley: I can't say it often enough -> thank you. :slight_smile:
The working source code:

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <omp.h>

const char         Literals[]    = "0123456789";
const unsigned int NLiterals     = sizeof( Literals ) - 1;
const char         TestString[]  = "1";
char               EndString[16];

void parallel_function( int, int * );

int main()
{
    int flag = -1;
    int count = 0;

    #pragma omp parallel default( none ) shared( flag ) reduction( +:count )
    {
        while( flag < 0 )
        {
            parallel_function( count, &flag );
            count++;
        }
    }

    if( !flag )
        printf( "Nothing found!!! %i tries\n", flag );

    else
        printf( "Match: %s ! Tries: %i\n",  EndString, flag );

    return 0;
}

void parallel_function( int count, int *flag )
{
    char str[10];
    int  c = count;

    if( ( *flag ) >= 0 )
        return;

    memset( str, 0, sizeof( str ) );

    // Compute the string from the value of 'c'. Each different
    // count will give a unique solution.
    int length = ( c % 5 ) + 1;
    c /= 5;

    for( int n = 0; n < length; n++ )
    {
        str[n] = Literals[c%NLiterals];
        c /= NLiterals;
    }

    // If and only if we have the correct solution, store the value.
    if( strncmp( TestString, str, sizeof( TestString ) ) == 0 )
    {
        strncpy( EndString, str, strlen( str ) );
        ( *flag ) = count;
    }
}

At least I want to post my source code. Because through the kindness of Corona688, now my code works, too. :slight_smile:

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <omp.h>

/* Global variables */

const char         Literals[]    = "0123456789";
const unsigned int NLiterals     = sizeof( Literals ) - 1;
const char         TestString[]  = "4877";
char               EndString[16];

/* Functions */

static void BruteForcer( const int *, const int *, int, unsigned *, int * );

int main()
{
    unsigned Counter     = 0;
    int      CheckStatus = -1;

    #pragma omp parallel default( none ) shared( CheckStatus ) reduction( +:Counter )
    {
        for( int WordLength = 0; WordLength < 5; ++WordLength )
        {
            int FirstEntry[WordLength], LastEntry[WordLength];

            for( int j = 0; j < WordLength; ++j )
                FirstEntry[j] = LastEntry[j] = 0;

            for( int i = 0; i < NLiterals; ++i )
            {
                FirstEntry[0] = i;
                LastEntry[0]  = i + 1;

                if( CheckStatus < 0 )
                    BruteForcer( FirstEntry, LastEntry, WordLength, &Counter, &CheckStatus );
            }
        }
    }

    if( CheckStatus < 0 )
        fprintf( stdout, "\nNothing found: '%s'! %u tries.\n", "Empty", Counter / 8 ); // Divided through the number of cores/threads <- in my case eight (Intel Core i7-920) ;-)

    else
        fprintf( stdout, "\nWord found: '%s'! %u tries.\n", EndString, CheckStatus );

    exit( EXIT_SUCCESS );
}

static void BruteForcer( const int *FstEntry, const int *LstEntry, int WdLength, unsigned *CounterPtr, int *CheckStatusPtr )
{
    char Word[WdLength];
    int  Entry[WdLength + 1];
    int  i, j;

    memset( Entry, '\0', WdLength );

    /* copy FstEntry to Entry */
    for( i = 0; i < WdLength; ++i )
        Entry = FstEntry;

    i = 0;

    while( i < WdLength )
    {
        ++*CounterPtr;

        /* generate word */
        for( i = 0; i < WdLength; ++i )
            Word = Literals[Entry];

        /* null-byte at end of string */
        Word[WdLength] = '\0';

        /* string compare */
        if( strncmp( TestString, Word, sizeof( TestString ) ) == 0 )
        {
            strncpy( EndString, Word, strlen( Word ) );
            *CheckStatusPtr = *CounterPtr;
            return;
        }

        /* increment Entry */
        for( i = 0; i < WdLength && ++Entry[WdLength-i-1] == NLiterals; i++ )
            Entry[WdLength-i-1] = 0;

        /* when Entry != LstEntry then leave loop */
        for( j = 0; j < WdLength; ++j )
            if( Entry[j] != LstEntry[j] )
                break;

        /* when Entry == LstEntry leave function */
        if( j == WdLength )
            return;
    }
}

The problem is solved. The thread can be closed. THANK YOU!!! :b:

1 Like