fgets problems

CRGreathouse · February 17, 2010, 8:42pm

I've been having trouble with reading past the end-of-file in C. Can anyone find my stupid mistake?

This is the minimal code needed to cause the error for me:

FILE *f = fopen(name, "r");
if (!f)
	return;
pari_sp ltop = avma;
char line[1100];
while(fgets(line, 1100, f) != NULL)
	printf(".");
fclose(f);

So name is a C-string pointing to a valid filename. The FILE f opens without trouble, and the fgets loop runs through each line in the file. (The actual processing code, which I deleted, parses those lines without trouble; here I replaced it with a line which shows how many lines it reads.) But the last read through fgets does *not* return NULL but causes an application-ending error. fclose is never run.

What am I doing wrong?

Corona688 · February 17, 2010, 9:30pm

That is not the minimal code to crash it... it works absolutely fine.

The crash is probably in some destructor somewhere. Or it may be that you've already corrupted the heap by the time you've started calling fread, maybe with that pari_sp object? Try getting it down to a really minimal program that still crashes.

CRGreathouse · February 17, 2010, 9:41pm

I was afraid of that. This is part of a huge project, so I tried to extract what I could to make it fail without involving too much complexity. In particular, the actual code is using the Pari library.

Working on it. The pari_sp isn't actually doing anything at the moment (it was going to be a part of the garbage collection process) so I know that's not it.

Corona688 · February 17, 2010, 9:45pm

You might try valgrind to help hunt down double-frees and memory leaks and such. And gdb, to see what code it's actually crashing inside.

CRGreathouse · February 17, 2010, 10:18pm

Good news! I managed to fix it. Unfortunately I don't know what I did -- I was actually just focusing on making it easier to find the error.

Thanks for your help.

jim_mcnamara · February 18, 2010, 12:43am

What Corona was saying - you probably have corrupted stack or heap memory somewhere earlier than the fgets call. This means there may still exist serious data corruption. You are just no longer trashing it to the point it crashes.

Does your output - even though code appears to work - seem odd? Pari is a bignum Math package, are your results reasonable? I would be inclined to seriously question the validity of the final value(s); can you verify independently? Please tell me you are not computing orbital data for earth crossing asteroids...

CRGreathouse · February 18, 2010, 6:56pm

Right, that's why it's good I rewrote the earlier portions.

This is fortunately only an I/O routine; no calculations to speak of. And after my rewrite it's quite short:

#define MAX_VECLEN 10000
#define MAX_LINELEN 1100
GEN
bfilein(char* name)
{
	FILE *f = fopen(name, "r");
	if (!f)
		pari_err(openfiler, "input", name);
	
	GEN v = vectrunc_init(MAX_VECLEN + 1);
	char line[MAX_LINELEN];
	int i = 0;
	while(fgets(line, MAX_LINELEN, f) != NULL) {
		char* kept = getBValue(line);
		GEN value = strtoi(kept);
		if (value == NULL)
			continue;
		if (++i > MAX_VECLEN) {
			pari_warn(warner, "only %d terms used; b-file has unread terms", MAX_VECLEN);
			break;
		}
		vectrunc_append(v, value);
	}
	fclose(f);
	return v;
}

with one helper function:

char*
getBValue(char* line)
{
	int start = 0;
	while (line[start] == ' ' || line[start] == '\t')
		start++;
	if (line[start] == '#')
		return NULL;	// Comment
	if (line[start] == '-')
		start++;
	while (line[start] >= '0' && line[start] <= '9')
		start++;
	while (line[start] == ' ' || line[start] == '\t')
		start++;
	int end = start;
	if (line[end] == '-')
		end++;
	while (line[end] >= '0' && line[end] <= '9')
		end++;
	if (start == end)
		return NULL;	// Blank line, or no numbers found
	line[end] = '\0';
	
	return line + start;
}

The parser is pretty boneheaded, but this isn't performance-limiting -- usually < 100 kB input.

Corona688 · February 19, 2010, 12:58pm

I don't see a main() anywhere, so that really can't be the only code. alloc/free bugs in code completely outside yours can cause crashes later.

CRGreathouse · February 19, 2010, 7:00pm

No, not nearly all of it -- I have 3200 LOC on top of a 6700 kB library. But those functions, together with 'glue code' for the library, can be compiled into the core application of the Pari library, called GP. So these functions provide stand-alone functionality.

Well, actually that's not true. They are called via this function:

GEN
bfile(GEN name, GEN v, GEN offset)
{
	pari_sp ltop = avma;
	GEN cur = gen_0, Anum = gen_0;
	// If no v is given, fine; but if it is it must be a vector.
	// Should this use is_vec_t(typ(v)) to allow t_COL as well?
	if (v && typ(v) != t_VEC)
		pari_err(typeer, "bfile");
	if (!offset)
		offset = gen_1;
	else if (typ(offset) != t_INT)
		pari_err(typeer, "bfile");
	cur = subis(offset, 1);
	
	if (typ(name) == t_INT)
	{
		name = gtovec(GENtoGENstr(name));
		GEN p2;
		while (glength(name) < 6)
		{
			p2 = cgetg(2, t_VEC);
			gel(p2, 1) = strtoGENstr("0");
			name = concat(p2, name);
		}
		Anum = concat(name, NULL);	// "0","0","0","0","4","0" -> "000040"
		name = concat(concat(strtoGENstr("b"), Anum), strtoGENstr(".txt"));
	} else {
		if (typ(name) != t_STR)
			pari_err(typeer, "bfile");
		// TODO: Try to extract a reasonable A-number, or just set to blank?
		Anum = strtoGENstr("000000");
		//Anum = concat(extract0(gtovec(name), stoi(126), NULL), NULL);
	}
	
	char* filename = GSTR(name);
	if (!v)
		return bfilein(filename);
	FILE *f = fopen(filename, "r");
	if (f)
		pari_warn(warner, "File `%Ps' already exists. Appending terms..", name);
	
	long l1 = lg(v);
	pari_sp btop = avma, st_lim = stack_lim(btop, 1);
	long i;
	for (i = 1; i < l1; ++i)
	{
		GEN e = gel(v, i);
		if (typ(e) != t_INT)
			pari_err(typeer, "bfile");
		if (cmpis(digits(e), 1000) > 0)
		{
			pari_warn(warner, "Next term has %Ps digits; exiting.\n", digits(e));
			break;
		}
		write0(filename, mkvec3(cur = addis(cur, 1), strtoGENstr(" "), e));
		if (low_stack(st_lim, stack_lim(btop, 1)))
			cur = gerepilecopy(btop, cur);
	}
	pari_printf("%%H A%Ps Author, <a href=\"b%Ps.txt\">Table of n, a(n) for n = %Ps..%Ps</a>\n", Anum, Anum, offset, cur);
	avma = ltop;
	return gnil;
}

which is itself called by GP. But I thought those functions could be understood in isolation.

Overview:

GP is a programmable calculator; C functions can be exposed directly to the users.
Anything marked "GEN" is a bignum object that interacts with the Pari library.
GEN bfile(GEN, GEN, GEN) can be called by the user with input. It turns the first argument into a filename ("file.txt" -> "file.txt"; 123456 -> "b123456.txt"; 40 -> "b000040.txt"). If the latter two arguments are NULL (in this case, this means the user sent only one argument to the function through GP), then GEN bfile (char*) is called. (The file is first checked to make sure it exists.) Otherwise, this function creates a file at that location and outputs data to it.
GEN bfile(char*) takes a filename, opens the file, and reads in the data, storing it to a GEN object it will pass back to GP

jim_mcnamara · February 19, 2010, 8:46pm

You have a lot of heap memory involvement. ie., hidden library calls to malloc - you need a malloc debugger.

electric fence is a great tool for finding these kinds of problems, IMO. You're on Linux, so it is likely already be part of your distribution.
efence(3): Electric Fence Malloc Debugger - Linux man page

Read the page above, re-compile your app, and run it.

If your executable eats your system memory, you could also consider Purify and/or Checkergcc which when used to compile code will run most of the checks Purify performs. Both of these consume way less memory than efence.

CRGreathouse · February 20, 2010, 1:24am

Huge amounts of memory management -- much more in the calculation parts of my program. Pari essentially forces you to manage all your own memory.

But malloc itself isn't really ever called, not even in the library calls (except initialization). All the memory is allocated initially, and the program tracks usage itself. So when I write

GEN x = stoi(3); // x = 3
GEN y = stoi(4); // y = 4
GEN z = addii(x, y); // z = x + y
z = mulii(z, x); // z *= x

malloc is called zero times in that sequence, but four objects are created in a large block of memory. The block of memory (called the Pari stack -- but it's just a large malloc'd variable, really) functions as a stack, insofar as it's my responsibility to leave return values at the top and move the pointers as needed to let garbage drop off.

Do you think these programs would be able to handle such a low-level system?

Corona688 · February 20, 2010, 12:19pm

Memory is memory. It doesn't have to track the contents of the pari block to notice clobbering happening outside it.

And you might be surprised what calls malloc. Even printf does.

CRGreathouse · February 20, 2010, 2:29pm

Sure -- but that's not the problem (and it won't be the problem). Every time the library uses the stack it ensures that there's room for what it's creating. Also, the stack is very large -- at least 20 MB and sometimes as big as 1 GB, with its pointer avma moving by [only] several bytes per call.

The real problem with memory management is leaving garbage on the stack (another possibility is returning corrupted objects). The usual technique for is

GEN genericfunction(----) {
	pari_sp ltop = avma;	// Keep a pointer to the top of the stack's current location
	GEN r;	// Return value (assigned below)

	// ...
	
	r = gerepileupto(ltop, r);	// copy r to the top of the stack and move avma just above it
	return r;
}

which ensures that you're not leaving garbage on the stack. But this is somewhat more complicated by returning complicated objects like vectors. For example:

GEN vector = cgetg(4, t_VEC); // Make a vector
gel(vector, 1) = gen_2;
int i = 2;
for(; i < 4; i++)
  gel(vector, i) = addii(gel(vector, i-1), mulii(gel(vector, i-1), gel(vector, i-1))); // v = v[i-1] + v[i-1]^2

Memory diagram:

(vector with pointers to its members A, B, and C)(A)(garbage)(B)(garbage)(C)

Corona688 · February 20, 2010, 4:16pm

Of all the faith-based initiatives Bush funded, faith-based software maintenance has to be the worst. Bottom line is, you don't know why it crashed, don't know why it stopped crashing, and don't know if the problem is truly solved yet or not. All we know is that something trashed some memory somewhere somehow. Time for a memory debugger.

All this odd stack rearranging makes me more and more suspicious of memory trashing, not less! When it works perfectly it works perfectly, but one mistake and you're mangling your own stack frame. It could also be a disguised array-bounds problem, these macros are using hidden stack variables and writing beyond them could smash your stack too.

CRGreathouse · February 20, 2010, 4:33pm

I'm just saying that in these functions, the only memory I manage myself is in a single array, char line. All other memory management is done by the library. And since library-related memory troubles will be problems *within* the Pari stack rather than overflowing the Pari stack*, the particular tool recommended here seem powerless to help me.

What I would need is something that could help me debug memory use within the Pari stack.

I recognize that you haven't seen the library code, so this you're going to have to trust... or not.

Agreed. It's error-prone, to be sure. The biggest problem (in my experience with this library) is leaving garbage behind, but I have had occasional problems of corrupted data -- say, moving the pointer too far so that some members of a live structure can be overwritten later.

The testing behind the library code is quite solid; I'm not concerned about it going past the end of the stack or overwriting things in the 'valid data' portion. My code, on the other hand, is comparatively suspect: I could move the pointer to the wrong place, so that the (correct) library code would overwrite my data.

There are, as you suggest, several stack variables, but the only one that I use regularly is avma, the "available memory area" pointer.

Corona688 · February 20, 2010, 5:03pm

Memory related troubles within the pari stack will not cause inexplicable problems outside pari. To cause crashes in unrelated things, either the heap or the stack is getting mangled somewhere by something. Since this is clearly impossible, your program was never crashing and this thread does not exist, and you have no need of a memory debugger to check for problems that never existed.

Less sarcastically, 'garbage on the stack' won't crash stdio or anything else programmed sensibly. Every function call you make leaves garbage on the stack, pari's nothing remarkable in that respect. If you're getting segfaults in senseless places, either your stack's being corrupted, or your heap. (Or you're having hardware problems like memory errors.) A memory debugger -- or really, any debugger at all -- will be more useful than a thousand people staring at the 1% of your code you're willing to present. In the time we've spend arguing about it you could've tried lots of things.

CRGreathouse · February 20, 2010, 9:22pm

Then the problem was never with memory.

Of course. It is something I need to be concerned with, though -- most of my early programs with the library were unusable (crashed whenever run) by using up all of the stack.

No, no segfaults -- not now, not before.

Corona688 · February 21, 2010, 12:02am

It died from some signal in a senseless place. If not a segfault then what? You didn't say, so we picked the obvious one to happen in that manner, a segfault.

CRGreathouse · February 21, 2010, 2:55am

Unfortunately I don't recall and it's fallen off the screen. Is there something analogous to .bash_history that keeps errors?

achenle · February 21, 2010, 10:49am

Let me get this straight: you have a custom memory management library, and your code using that library crashes?

OK.

You need all the help you can get. But there's not much out there.

Your only hope is to do something like download a demo copy of Purify and use it. And then pray that you don't have any bugs in your custom library.

And good luck.

You're going to need it.

There's a reason why nobody writes their own memory management libraries. First off, the multiple libraries available for free on your OS of choice are almost certainly fast enough. If they're not, you're most likely doing something wrong, like an over-reliance on malloc()/free() and/or new/delete. And if the OS libaries truly aren't fast enough (and if you're not multithreaded on a massively parallel application, they ARE fast enough!), there are third-party memory management libraries available.

Go price something like Smartheap. Then calculate all the hours spent on your custom memory library and how much those hours cost. Want to bet Smartheap is cheaper? If you even need it in the first place.

There are probably literally thousands of man-years invested by many absolutely brilliant computer scientists and programmers in all the memory-management libraries available in today's operating systems. And those products have been thoroughly tested - for literally decades.

You want a car analogy? Writing your own memory management library is like chiseling out a stone wheel for a sports car. The only reason to do it is for the sake of doing it - as a hobby or academic learning experience. Because there's no way you can duplicate the engineering history behind a "normal" wheel. Even if you do manage to chisel out a wheel that manages to outperform a "normal" wheel in a tiny operating range, you'll never really know how durable your wheel is because it's totally untested.