Memory and cache access time discrepancy

#include<stdio.h>
#include<stdlib.h>
#include<sys/time.h>
#include<time.h>
#include "rdtsc.h"
#define SIZE 4*64*1024

int main()
{
unsigned long long a,b;
int arr={0};
int i;
register int r;


a=rdtsc();
r=arr[0];
b=rdtsc();

printf("1st element Access Cycles = %llu\n",b-a);

a=rdtsc();
r=arr[1];
b=rdtsc();
printf("2nd Element Access Cycles = %llu\n",b-a);

}

In the above code I am trying to determine the number of cycles it takes to fetch the first element of an array from memory, and then the next cached element. When I execute this snippet, I am getting almost identical number of cycles for both accesses ~81 cycles. Can anybody explain me why this is happening. By all means, the first access should be very costly, but the access to next sequential element which has been brought into the cache should be much lesser.

Thanks.

Entering main() writes zero values into all elements of arr which might or might not overflow your cache. Even if it does overflow your cache, setting a with your first call to rdtsc() will probably pull in the first several elements of arr since a , b , and arr will probably be allocated close to each other on the stack.

The compiler's optimizer noted that the value of "r" was never used, and eliminated the operation.

And if the code isn't optimized, the compiler almost certainly produced a LOT more instructions than necessary to simply load a value from memory into a register.

Nevermind the time spend actually doing the rdtsc() routines themselves.