memcpy error

rajsekhar28 · October 14, 2011, 9:23am

I am getting segmentation fault in memcpy.I have given sufficient memory but i dont know why it is occurring

char *finalptr = ( char *)malloc(1048576* sizeof(char));
finaloffset=0;have=685516;
memcpy(&(finalptr)+finaloffset,out,have);
finaloffset=685516;have=359910;
memcpy(&(finalptr)+finaloffset,out,have);

The first time it copies successfully but second time it gives segmentation fault

signal SEGV (no mapping at the fault address) in memcpy%sun4v-hwcap3 at 0xffffffff7f01b874
0xffffffff7f01b874: memcpy%sun4v-hwcap3+0x0604: stx      %o4, [%o0 - 8]t.

By calculation I am not crossing boundary but it is giving error
OS:Solaris

MacMonster · October 14, 2011, 9:31am

You tried to copy the data to the address of the pointer (&) which is not correct.

The code should be:

memcpy(finalptr + finaloffset, out, have);

alister · October 14, 2011, 11:31am

Unrelated to the segfault, there are a couple of things about this statement of which you should be aware.

In general, if you're going to multiply a number by the size of a data type, you should use calloc(). If not, you should check for overflow before passing the result to malloc(). I say in general because in this specific instance, where the number is hardcoded and the product is almost certainly less than the max value of size_t, it's extremely unlikely that it will cause a problem.

Unless you know why you're doing it, it's best not to cast unnecessarily. Many C programmers do it unthinkningly, but it is a practice with zero benefit and a potentially insidious drawback: supression of compiler warnings and errors.

If the malloc implementation does not return void * (or in this instance * char ), the explicit cast tells the compiler that its okay if malloc returns an int (which is not unheard of) or some other type, and, should such a mismatch occur, that you want it to silently coerce the return value regardless of its type. Such coercion can lead to unexpected behavior.

Regards,
Alister

rajsekhar28 · October 14, 2011, 11:51am

@MacMonster Thanks because of it i was getting segmentation fault although i dont know why my format is wrong because i wasn't getting any compilation error.
@alister although I haven't written here I am checking for correct memory allocation.So that isn't the culprit.

Regards,
Rajsekhar

CarloM · October 14, 2011, 12:07pm

You wouldn't get a compilation error - the function is expecting a pointer, which is what you were supplying. Just not the right pointer :).

Corona688 · October 14, 2011, 12:14pm

The compiler will happily chop off its own foot if you ask it to, as long as you do so with correct grammar. It doesn't understand the intent of your program.

DreamWarrior · October 14, 2011, 12:56pm

Why calloc? It zeros the memory and, as such, will touch every page allocated, forcing the kernel to actually create a physical page for it. This is wasteful, IMO, especially if you'll be writing to that memory again immediately (as he is) or, worse, if you'll be writing to the memory sparsely. Plus, I always thought that "calloc(x, y)" was pretty much equivalent to "malloc(x*y)" minus the obvious zeroing of the returned memory.

I've never seen a malloc implementation return int, but since the compiler assumes any functions that it hasn't seen a definition for take integer parms and return int, casting is a good way to step on your foot if you forget to include the header for malloc, lol. I've seen this before, and it's not pretty when someone spends forever trying to debug a core dump due to the compilers' (mis)understanding that malloc returns int and sign extends the result badly into the pointer causing its value to be incorrect and, thusly, causing a crash. All warnings were suppressed because of the cast. (BTW, it wasn't me, lol. I was the one who noticed stdlib wasn't included, and upon me were bestowed many thanks, lol). Of course, to this day I still have to force myself not to cast the result of malloc, but that day inspired me to stop doing it.

alister · October 14, 2011, 2:12pm

If x*y > SIZE_MAX , because of that overflow, malloc will silently attempt to allocate an amount of storage less than what was intended. This won't happen with a properly implemented calloc(x, y). If the overhead of zeroing the allocation is undesirable, then explicitly checking for overflow before using malloc would be wise.

I have, but that was a while ago. These days, the case of the absent header, that you mentioned, is a much more likely scenario.

Regards,
Alister

Corona688 · October 14, 2011, 2:58pm

If you're allocating enough memory to approach SIZE_MAX, you really, really don't want to zero it first

DreamWarrior · October 17, 2011, 1:43pm

Please bring in 2 GB worth of pages kernel; I'll wait, lol.

alister · October 19, 2011, 6:26am

It's not much of a wait. On my 5 yr old laptop, memset()-ing 2 GB takes a fraction of a second. I didn't time the memset() itself, but it took less than one quarter of a second to complete a fork-exec-main-malloc-memset-exit sequence, which determines the upper bound with plenty of padding.

Even so, memset() is largely irrelevant in this scenario. For a single, unusually large allocation, you're practically guaranteed that there won't be a suitable chunk already available within the allocator's store. In that case, the requested pages will need to be allocated with sbrk or mmap and those pages will be zeroed by the kernel regardless. I repeat, whether you call malloc or calloc, in this scenario, you will be getting zeroed memory. And, with calloc(), you get complimentary overflow checking. WOOHOO!

I looked over the glibc, FreeBSD, and OpenBSD calloc() implementations and they never re-zero memory that's freshly delivered by sbrk() or mmap(). Should anyone care to have a look for themselves, for your convenience:
glibc malloc.c
FreeBSD malloc.c
OpenBSD malloc.c

Furthermore, it's quite possible that alll of those zeroed pages are backed by a single zeroed page that the kernel keeps around specifically for such situations. In that case, a real page won't be mapped until something is written.

A last point. If you really, really, really truly care about squeezing every last ounce of performance from the allocators (some folks do even when it's unwarranted), then don't use malloc and friends. They're designed to be general purpose routines and the tradeoffs involved beget complexity. A stupid, simple, single-purpose, tailor-made solution may be called for (although this is very seldom the case).

Regards,
Alister

Corona688 · October 19, 2011, 12:19pm

Imagine how many times a second you could do that if you weren't memsetting.

Besides, you're confused between 'allocate' and 'page in'. Blanking two gigs of RAM at once means paging in two gigs of RAM, turfing two gigs of perfectly cache, etc, etc. That can have major performance consequences for everything else. Burning a CD when something pages out all your cache for instance can produce an instant coaster.

Double the work for no gain is never irrelevant.

Yes -- as needed. NOT all at once. Only when paged in.

I repeat. There is a difference between allocating memory and paging it in.

---------- Post updated at 10:19 AM ---------- Previous update was at 09:55 AM ----------

This is a good find

alister · October 19, 2011, 12:31pm

With all due respect, Corona688, you're the one that's confused.

I am aware that allocation and paging in are discreet steps. What I wrote above in no way suggests that memset() does not incur further overhead beyond sbrk or malloc. Quite the contrary, I ran that memset() to demonstrate that even with the further overhead and work, zeroing 2 GB of ram does not take very long.

As for the rest of my post, the gist is that there is no difference between calloc and malloc for a large allocation whose pages aren't already mapped in.

Why not? If the memory needs to be zeroed, it needs to be zeroed. The size of the allocation is irrelevant. For a small allocation using a recycled chunk, the memset hit is negligible. For the huge allocations discussed, if there is sufficient memory for the allocation to succeed, calloc() knows that the allocation will be backed by fresh pages that are already zeroed. Knowing this, calloc() will not call memset(), and so no paging in will occur until the memory is written to. In the end, calloc() and malloc() are equivalent.

Regards,
Alister

Corona688 · October 20, 2011, 12:34am

if you read my second post below my first one, you'll see I realized the implications.

Whether my CPU and memory bus have cycles to waste really isn't related to whether certain functions waste them, which did a good job of confusing me to your point. I still don't understand what your benchmarks are supposed to prove. Luckily it doesn't matter.

alister · October 20, 2011, 9:02am

Re-reading my post (#11), I see how it can be misunderstood. I did not intend for the first paragraph to have anything to do with the rest of the post. I should have indicated that clearly (either with language or formatting) or I should have made it a separate post.

When the second paragraph begins, Even so, memset() is largely irrelevant in this scenario , the scenario I'm referring to has nothing to do with the memset "benchmark" in the immediately preceding, opening paragraph. I was referring instead to what had been the topic of the thread at that point, a singularly large allocation, and how it's handled by calloc() in today's open source systems. (I'm curious if the proprietary unices behave similarly. I assume so, but I have no specific information.)

The memset benchmark isn't intended to prove anything except that memset-ing 2 GB takes on the order of a fraction of a second rather than a minute or an hour. Nothing more. As far as benchmarks go, it wasn't a particularly ambitious one.

Regards and apologies for the confusion,
Alister

Corona688 · October 20, 2011, 3:00pm

Thanks for the explanation, and sorry for being dense in my initial reading of it.

Memsetting one gig of RAM takes 1.4 seconds for me. Imagine the amount of actual work that computer could've done in that time instead. Congratulations on your fast computer though.

DreamWarrior · October 20, 2011, 8:08pm

Yeah man, honestly I don't know what he's talking about with his benchmark. I just ran my own really quick test comapring a calloc'ing vs the malloc -> memset sequence and they are darn near identical. Which makes sense because calloc has to bring in pages and once you touch the malloc'd page it has to be pulled in too. Worse, my laptop only has a gig of ram, so deity forbid I attempt to calloc or malloc and memset a gig, I'll start swapping! But, unsurprisingly, I can call malloc for a gig of ram and it'll return immediately. So long as I don't touch the pages, it'll never slow down.

Point is, calloc is different than malloc and shouldn't be used unless you need all your memory zero'd. And really, what application does? Most malloc's would be followed by something "useful" like a memcpy or filling in the malloc'd memory with useful data. Further, you most certainly wouldn't use calloc for a sparse array, that'd just be crazy.

P.S. here's uname -a

Linux laptop 2.6.32-34-generic #77-Ubuntu SMP Tue Sep 13 19:40:53 UTC 2011 i686 GNU/Linux

edit: I just ran it on a work machine, memcpy followed by memset for 1 GB and calloc for 1 GB were also identical and about 2 seconds. This is on an P570 frame with 12 GB of memory and a 2 CPU's allocated. So...I'd love to know what computer does it in "fractions of a second".

edit2: I suppose I also stirred this up, by saying "I'll wait" as if to imply it'd be ages. But, in computer terms, 2 seconds is "ages". Plus, if you had to swap, it'd really be "I'll wait" because on my poor laptop with 1 GB of RAM, asking it to memset (or calloc) a gig started it swapping; my music in the background was starting to skip and the harddrive started to spin as memory was being paged to disc. It was BAD, lol. After I killed the process, it still took about 10 seconds for the poor thing to normalize, and my music player hung and wouldn't come back, so I had to kill it, lol. Fortunately, the same program without the memset (and just the malloc) ran and ended immediately, because it just pulled in address space to the process, never a physical page, and so never did any actual work. Hence the BIG difference between malloc and calloc that started this whole off topic thread of communication.

Corona688 · October 20, 2011, 10:13pm

alister explained why this is irrelevant for large amounts of memory: 1) it doesn't bother, because 2) it maps it in with mmap instead, meaning 3) the kernel does it for you at the time of paging in and not before.

DreamWarrior · October 21, 2011, 12:59am

Seems that's not true for both the kernels I tested with. They both took a performance hit identical to malloc+memset (which means they both bring in pages). In fact, I'd bet the actual zero'ing itself is not the problem, it's the creating physical pages that is. Either way, on every machine (three thus far) I've ran a quick calloc(1gb, 1) or malloc(1gb) -> memset(p, '0', 1gb) comparison, they are indistinguishable so far as performance is concerned. Both heartily lose out to a plain malloc(1gb), which is instantaneous.

Regardless...I'll stick to my guns, calloc is pointless; use malloc and initialize the memory in-situ as appropriate afterwards.

alister · October 21, 2011, 4:28am

No. It does not. It may, but nothing requires it. Small allocations may be handled by already resident pages. Large allocations are mmap'd and since those pages will be zeroed by the kernel, calloc doesn't need to touch them. None of the callocs in in any of the standard c libraries used by the popular open source unix flavors (I looked at Linux/glibc, FreeBSD, NetBSD, and OpenBSD) will call memset to zero a page which will already be zeroed by the kernel before being made available to the process.

The most obvious explanation for why your system shows no difference between malloc+memset and calloc is that your c library's calloc is naive. Or perhaps your code is flawed. Or perhaps your kernel vm subsystem is prefaulting for some reason. Or perhaps your system's environment has enabled malloc/calloc options which affect their behavior (such as filling the allocation with "junk" or zeroes). Perhaps one of the bazillion linux kernel compile options is to blame. If it were my system, I'd look into it just to satisfy my curiosity.

Obviously. My point is only that under certain conditions malloc and calloc are practically identical (both will return zeroed memory without calling memset). See for yourself in the malloc.c source links I provided in an earlier post. You'll find that both are implemented using the same internal routines. Further, if you follow the code path for a large allocation, you'll see that a calloc never memsets (unless certain options which are disabled by default are enabled).

I must retract my earlier quarter of a second figure. I cannot reproduce it. I must have misread the value. Perhaps it was 2.50s instead of 0.25s.

Here's some code and timings from OS X running on a 2.16 GHz Core2Duo Macbook with 2 GB of 667 MHz DDR2 (similar results were observed using NetBSD on similar hardware). Without any command line arguments, the executable will attempt to calloc 1 GiB. With command line arguments, it will malloc and memset 1 GiB:

$ cat large-calloc.c
#include <stdio.h>
#include <stdlib.h>
#include <string.h>

#define ALLOCSIZE 1073741824U

int
main (int argc, char **argv) {
	void *vp;

	if (argc == 1) {
		if ((vp = calloc(1, ALLOCSIZE)) == NULL)
			return 1;
	}
	else {
		if ((vp = malloc(ALLOCSIZE)) == NULL)
			return 1;
		memset(vp, 0, ALLOCSIZE);
	}
	printf("%p\n", vp);
	return 0;
}
$ cc -Wall -pedantic large-calloc.c 
$ time ./a.out
0x2008000

real    0m0.005s    # calloc
user    0m0.001s
sys     0m0.004s
$ time ./a.out with-memset
0x2008000

real    0m2.124s    # malloc + memset
user    0m0.946s
sys     0m1.168s

That's your prerogative, but, for a large allocation with a reasonably recent C library, you're choosing to use memset to zero malloc'd memory that is probably already zeroed, instead of using calloc, which knows whether the memory is already zeroed and can avoid the overhead of a redundant memset.

So long as they're not aimed at my code, use your guns as you see fit.

Regards,
Alister