Weird timing results in C

I was running some timings in my code to see which of several functions was the best and I've been getting some odd results. Here's the code I'm using:

static double time_loop(int (*foo)(int)) {
	clock_t start, end;
	int n = 0, i = 0;
	start = clock();
	for (; i <= MAXN; i++)
		if ((*foo)(i)) n++;
	end = clock();
	if (n != 50847534)
		return -1;	// Error
	return ((double) (end - start)) / CLOCKS_PER_SEC;
}

static double median3(int (*foo)(int)) {
	double t1 = time_loop(foo);
	double t2 = time_loop(foo);
	double t3 = time_loop(foo);
	printf("  ((%f %f %f))\n", t1, t2, t3);
	int errors = 0;
	if (t1 < 0)
		errors++;
	if (t2 < 0)
		errors++;
	if (t3 < 0)
		errors++;
	if (errors > 0)
		return -errors;
	if (t1 > t2) {
		if (t2 > t3)
			return t2;
		return t3 < t1 ? t3 : t1;
	} else {
		if (t1 > t3)
			return t1;
		return t3 < t2 ? t3 : t2;
	}
}

The functions that I'm passing to time_loop are pure functions -- they're not making any changes to global variables. And yet every time I run this, my debugging line in median3 shows that the third invocation takes significantly longer than the others. Sample run:

  ((4.560000 3.750000 6.540000))
Original: 4.560000 s
  ((4.920000 3.760000 6.690000))
Original: 4.920000 s
  ((4.190000 3.750000 6.620000))
Original: 4.190000 s
  ((3.820000 3.130000 5.640000))
New:      3.820000 s
  ((3.580000 3.150000 5.580000))
New:      3.580000 s
	  ((3.670000 3.140000 5.510000))
New:      3.670000 s

Why is that? These functions are entirely deterministic, and nothing is changing between invocations. But I see a strong pattern of time - smaller time - much longer time.

As for cache effects, the functions rely on a 125 MB global array, so that's not going to fit even in my 6MB L3. (Even if it did you'd expect the second and third invocations to be fast.)

Can you post the code for function foo.

for (; i <= MAXN; i++)
		if ((*foo)(i)) n++;

That iterates MAXN which must be at least 50847534 - for each of the three calls to time_loop() the function ptr is jumping to the same function. Unless that code has to dynamically allocate memory or there is OS contention for some resource, there ought not be those differences.

I happen not to like the static declaration of functions for a variety of reasons except in certain instances-- But that is very probably not the cause of this problem.

You need to call getrusage() before and after - like you now do with clock() to see what is happening in the world of system resources.

Also clock() is just wall time, use the times() function to get system & user time - as consumed by parent & child.

I think shamrock wants to see that you are not calling exec or something else that eats the system resources or causes high kernel times maybe.

More questions. What type of system are you using? What OS?

It would be nice to see what foo is doing and you can use gprof for profiling your code to figure out why some calls are taking longer.

Nothing funny, just bit manipulation and accessing a big array. The functions are all similar and all display the same results; here's one example:

#define GET(b) ((sieve[(b)>>5]>>((b)&31))&1)
static int isprime3(int p) { return (p&1) ? p > 2 && !GET(p >> 1) : p == 2; }

The system is 32-bit Xubuntu 8.10 on a quad-core Phenom II.

As I said before, the results are consistent. When I call

	printf("New:     %f s\n", median3(&isprime3));
	printf("New:     %f s\n", median3(&isprime3));
	printf("New:     %f s\n", median3(&isprime3));

the time for the first, fourth, and seventh times are fast, the times for the third, sixth, and ninth times are slow, etc.

---------- Post updated at 01:23 AM ---------- Previous update was at 01:18 AM ----------

I don't do it much. If you have advice on better ways to do it, feel free to expound (or give a link!).

With regard to static functions:
I assume this is because the C99 standard says little about implemention of static text other than scope changes -

In large single file programs with a lot of static functions mixed in with auto functions -
I've encounteered really obscure problems in gcc 3.1, cc on HP UX 11.1. For example an auto function not being executed at all. Sounds odd, but it is true. The same "inherited program design" also screwed up optimizations by forcing the complier to make decisions it was not expected to make - according the HP compiler team anyway.

Basically to resolve the problems I had to completely turn off optimzation, and declare all functions static to get correct results. Since a fixed makefile was a requirement for the entire app suite, this meant a big rewrite for several very large programs.

Your problem is nothing like that. It almost HAS to be cache-related Based on what you said about your hardware, if you were to chunk your 125MB of data into 21 ~6MB chunks, then iterate (1,2,3) over each subset chunk the problem would not be there. If it is cache related - and with no other observations to work on that would be my guess.