Pthread attr setting doesn't work before thread create?

Hello everyone,
I created a test program for pthread priority set. Here's the code, very simple, 60 lines only.
I've tried this prog on my Fedora 13(on vbox), and on my 6410 arm linux 2.6.36. Both the same result.
Both environments are using root privileges.
Can any body tells me why the prio & policy set before thread create is not worked?
Thanks in advance.

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <errno.h>
#include <fcntl.h>
#include <unistd.h>
#include <pthread.h>

static pthread_t ptchild;
void* childthread(void* arg)
{
    struct sched_param pr;
    int ret = 9;
    int policy;

    pthread_getschedparam(pthread_self(), &policy, &pr);
    printf("Child Thread Up PL%d PRI%d!\n", policy, pr.sched_priority); //The result here 

    policy = SCHED_RR;
    pr.sched_priority = 19;
    pthread_setschedparam(pthread_self(), policy, &pr);
    sleep(1);

    pthread_getschedparam(pthread_self(), &policy, &pr);
    printf("Child Thread Up PL%d PRI%d!\n", policy, pr.sched_priority); //resutl set

    sleep(1);
    printf("child exit\n");
    pthread_exit((void*)ret);
}

void main(void)
{
    pthread_attr_t attr;
    struct sched_param pr;
    int ret;
    void* childret;

    pr.sched_priority = 19;
    
#if 1
    printf("%d\n", pthread_attr_init(&attr));
    printf("%d\n", pthread_attr_setdetachstate(&attr, PTHREAD_CREATE_JOINABLE));
    printf("%d\n", pthread_attr_setschedpolicy(&attr, SCHED_RR));
    printf("%d\n", pthread_attr_setschedparam(&attr, &pr));

#endif

    if(ret = pthread_create(&ptchild, &attr, &childthread, NULL)<0){
        printf("Thread Create Err %d\n", ret);
    }

    /* Wait write Process end, then cancel report thread */
    if(ret = pthread_join(ptchild, &childret) < 0){
        printf("Thread Join Err %d\n", ret);

    }
    else{
        printf("joined, ret = %d\n", (int)childret);
    }
    return;    

}

The result is:

0
0
0
0
Child Thread Up PL0 PRI0!   <- strange here, policy & prio not set
Child Thread Up PL2 PRI19!  <- in thread settings worked?
child exit
joined, ret = 9

Perhaps your thread lacked PTHREAD_SCOPE_SYSTEM, i.e., was not a lwp, so the lwp and its scheduling nice-ness is that of the parent.

Yeah, that's the classical trap when using real-time scheduling. You need to set the inheritsched attribute to inherit scheduling attribute at thread's creation. See:

man pthread_attr_setinheritsched

HTH, Lo�c

---------- Post updated at 10:43 PM ---------- Previous update was at 10:35 PM ----------

I know only 2 UNIX variant that implements a N-M Scheduler: Solaris (up to 8) and Tru64. Does anyone know others?

AFAICS, only Tru64 managed to have an efficient version that worked. Though interesting, Solaris abandoned the M-N Scheduler for a (easier) 1-1 in Solaris 9.

Linux always used a 1-1 variant. There was an attempt with NGPT from IBM to have a M-N variant; it got quickly surpassed by the NPTL 1-1 implementation and the O(1) kernel scheduler class introduced in the 2.6. kernel series.

Cheers, Lo�c

So usually nice is per process, not per thread, for now?

POSIX states that the nice is a process-wide attribute.

NPTL used to have a non-conformance bug concerning that point (i.e threads do no share a common nice value). I don't if it's fixed meanwhile.

Cheers, Lo�c

If the threads of one LWP had different scheduler priorities, on a N-M schedulet, does nice change values as threads are given the LWP? Process nice variations are not so good for handling this. I guess the LWP would need to stay at the max nice of any thread so as to prevent a low nice thread getting the LWP rolled out while the high pri thread is now needed to run.

Hi DGPicket,

perhaps I am not following you completely...

When you say "LWP had different scheduler priorities", I assume you mean that these LWPs are using a real-time scheduling (like SCHED_FIFO or SCHED_RR for instance). In this particular case, changing the nice value of the process does not influence these threads, as stated in the POSIX standard:

Or I am missing something?

The policy SCHED_FIFO or SCHED_RR seems to be for multiple threads within a LWP, and the nice of that LWP would affect them all. If such threads have different numerical priorities like the nice, I wonder how you could set the LWP niceness. I would expect it must reflect the highest thread's priority, as the lower need to not take long else the priority on the higher cannot be honored/recognized.

This is my understanding how things are supposed to work . Warning: the last time I dealt intensively with such questions was 6 years ago, so I can't guarantee to get everything right.

POSIX differentiates between threads subject to process contention (PCS) and system contention scope (SCS). PCS threads corresponds to the user level threads that are scheduled by the thread library, whereas SCS threads to kernel level threads that are scheduled by the OS (I think, Solaris calls SCS threads LWP).

We may set different scheduling policy and priority to these PCS; the scheduling shall be always relative to the other PCS threads within the process. We may for instance have 3 PCS threads; thread 1 with SCHED_FIFO policy and priority 1, thread 2 with priority 2, and thread 3 with the default time sharing policy (SCHED_OTHER). Assume further that these 3 PCS threads are mapped onto 1 SCS thread. When this SCS thread gets scheduled by the OS, the thread library shall look which PCS thread should run: thread 2 shall be scheduled if runnable, otherwise thread 1, otherwise thread 3.

Now, how does nice() affects the whole thing? First nice() only operates at the process level, and doesn't really make sense for real-time policy. We expect therefore nice() to affect only SCS threads that are not subject to real-time policy. Indeed, POSIX states:

Back to our previous example. If I renice my process, and if my process is not subject to real-time policy, the corresponding SCS shall be scheduled less frequently (assuming I increased the nice level) compared to other SCS running on the system. So will the 3 PCS threads. From a system perspective, we can say that all 3 PCS have been impacted simultaneously by this renice operation. From the perspective of our PCS threads however, nothing has changed since the scheduling is always relative to other PCS threads (except perhaps that it feels like running on a slower CPU).

HTH, Lo�c

Yes, that is my model, too.

If you stack threads within a lwp, then the library dispatcher switches between them, but if it has a new lwp, then it gets what the O/S kernel dispatcher gives, and can be truly concurrent. Solaris and perhaps others only allow you 512 lwp, so if you want more, you must either multiprocess or share the lwp.

If any thread in a lwp blocks, the other threads in there are not running. So, if your threading is to segregate blocking I/O, you need lwp's, but if you are just servicing/polling minority activities in neat, separate threads, then having them share a lwp is appropriate. Often, threading can be used to do asynchronous I/O by letting threads block, but then if you need bandwidth, you need a lwp per thread/device. If you want each thread to exploit a different SMP CPU, you need a lwp per thread.

Finally, the yield functions are for the threads sharing a lwp, to give the CPU to their brothers, but in either case, how do you get rid of the CPU when you have no use for it? Does some flavor of yield know that all threads on the lwp are very recently satisfied, and the CPU should be handed off? Do you have to sleep or poll(0,0,1) or the like?

My understanding is that a good M:N scheduler should avoid the situation that when an user level thread blocks, all the user level threads mapped to the lwp block (because the lwp itself blocks). One possibility is to use scheduler activation, see this article if you're interested.

You mean how to give the CPU away for the entire process? The only way I know to achieve this would be to raise SIGSTOP, but then there is no mean to get again the CPU, unless an external process sends SIGCONT :wink: The question is: why would you want to do this?

The POSIX way to inform a thread to relinquish the CPU is sched_yield(). This is appropriate for switching between user level thread; otherwise such a thread would run until it blocks or completes. Usually sched_yield() causes the calling thread to be moved at the end of some scheduling queues. So if your thread is the only one in that queue, it still continues to run after sched_yield()...

And even if a 1:1 thread model is used, what does the OS scheduler do when all runnable threads call sched_yield()? I guess that most schedulers would schedule the threads (more or less in turns), until the process time slice has been exhausted. work-around like sleep or poll(0,0,1) or the like would exhibit a similar behaviour, I am afraid.

Cheers, Lo�c

I give away the cpu because I am done with it and have other processes on the host that might be cpu bound. My best scenario is to have a blocking thread on each lwp thread, so the CPU is released and returned. Similar issues occur with poll/select and no-blocking and asynch i/o -- OK, now that you have nothing to write/send or no buffer space left on this host, and no incoming data in your buffers, how do you pass off the CPU like a good UNIX citizen if you want really low latency, less than poll()'s nominal millisecond? How can you become interrupt driven, and avoid wasting CPU on polling? Windows loop detection was caca! :smiley:

Which OSes are you targeting?

Lo�c

?!?!?

This code:

#include <pthread.h>
#include <unistd.h>
#include <stdlib.h>
#include <stdio.h>

void *run( void *arg )
{
    int id;
    id = ( int ) arg;
    sleep( 1 );
    return( NULL );
}

int main( int argc, char **argv )
{
    int ii;
    int rc;
    int num_thr;
    num_thr = strtol( argv[ 1 ], NULL, 0 );
    pthread_t *tids = calloc( num_thr, sizeof( *tids ) );
    for ( ii = 0; ii < num_thr; ii++ )
    {
        rc = pthread_create( &( tids[ ii ] ), NULL, run, ( void * ) ii );
        if ( 0 != rc )
        {
            fprintf( stderr, "Failed on thread %d\n", ii );
            break;
        }
    }

    fprintf( stderr, "started %d threads\n", ii );

    for ( ii = 0; ii < num_thr; ii++ )
    {
        pthread_join( tids[ ii ], NULL );
    }

    return( 0 );
}

produces this on Solaris 10:

-bash-3.00$ ./thr 32000
started 32000 threads
-bash-3.00$ ./thr 100000
started 100000 threads
-bash-3.00$ ./thr 1000000
started 1000000 threads

Yes, 1,000,000 threads. I didn't check to see if they were all concurrent at that point, though, since it took about 100 seconds to run. The 32,000 thread example ran in a second or two.

That's Solaris 10; I think that DGPickett is interested in Solaris 8 (or earlier) since he mentioned M:N threading model. Indeed, starting from Solaris 9, they switched to a 1:1 model.

Cheers, Lo�c.

So you had 32000, 1M LWPs? I guess they opened it up a lot. I have no late model solaris to test. Is this a POSIX improvement?

I wonder what the max concurrent thread per process is for various OS?