Measuring the correctness of ndelay() function.

I wrote this kernel module to test the correctness of ndelay() function.

Kernel mdoule:

#include <linux/module.h>
#include <linux/init.h>
#include <linux/kernel.h>
#include <linux/time.h>
#include <linux/delay.h>
static int __init initialize(void)
{
	ktime_t start, end;
	s64 actual_time;
	int i;
	for(i=0;i<1000;i++)
	{
		start = ktime_get();
			ndelay(100);			
		end = ktime_get();
		actual_time = ktime_to_ns(ktime_sub(end, start));
		printk("%lld\n",(long long)actual_time);	
	}
	return 0;
}

static void __exit final(void)
{
     printk(KERN_INFO "Unload module\n");
}

module_init(initialize);
module_exit(final);

MODULE_AUTHOR("Bhaskar");
MODULE_DESCRIPTION("looping for 1000 times");
MODULE_LICENSE("GPL");

The output of dmesg:

[ 1028.135657] 379
[ 1028.135659] 383
[ 1028.135661] 370
[ 1028.135663] 372
[ 1028.135665] 332
[ 1028.135667] 348
[ 1028.135668] 318
[ 1028.135670] 365
[ 1028.135672] 350
[ 1028.135674] 327
[ 1028.135676] 359
[ 1028.135677] 358
[ 1028.135679] 320
[ 1028.135681] 393
[ 1028.135683] 359
[ 1028.135685] 357
[ 1028.135687] 361
[ 1028.135688] 325
[ 1028.135690] 360
[ 1028.135692] 302
[ 1028.135694] 353
[ 1028.135696] 375
[ 1028.135698] 342
[ 1028.135700] 377
[ 1028.135702] 382
[ 1028.135704] 390
[ 1028.135706] 322
[ 1028.135707] 361
[ 1028.135709] 392
[ 1028.135711] 387

Here my question is I'm trying to measure 100ns delay, but I'm getting output in 300's. One of the possible things that can happen is storing the value of ktime_get() into end module is taking some time. So can anyone please suggest me alternative or where I'm going wrong.

Not necessarily an answer but you could make ndelay very large and time it (possibly outside of the routine).