C: inputting string of unknown length

I realize this general issue (inputting strings of variable length in C) has been addressed in myriad locations before, but I'm interested in knowing why my specific approach is not working. (BTW I'm intentionally keeping the size increments small so that I can more easily follow what's going on. After it works on a small scale, I can increase the size to something more reasonable. The main motivation for this approach is that I want to increase the size by a fixed increment, not by doubling the allocated memory each time.)

Here is the code:

#include <stdio.h>
#include <string.h>
#include <stdlib.h>

#define SIZE 4

int main(void)
{
	int mem = SIZE;
	char *str = malloc(mem); // let's keep str pointing to beginning of string...
	char *next_read = str; // ...and next_read pointing to where next character should go

	fgets(next_read, mem, stdin);
	next_read--; // so that after we add SIZE to pointer, it points to current '\0'

	while(str[strlen(str)-1] != '\n') // if we got whole string, the last char will be '\n'
	{
		mem += SIZE;
		str = realloc(str, mem); 
		next_read += SIZE;
		fgets(next_read, SIZE+1, stdin); // read the rest (hopefully) of the line into the new space
		printf("str is now %s\n", str);
	}

	printf("final str is %s", str);
	// free(str);
	return 0;
}

The code works fine for short strings, but stops working (program seems to get stuck) if string is longer:

bruno@thinkpad:~/Desktop434$ gcc getstring.c 
bruno@thinkpad:~/Desktop436$ ./a.out 
I love linux
str is now I love 
str is now I love linu
str is now I love linux

final str is I love linux
bruno@thinkpad:~/Desktop436$ ./a.out 
I love linux and the C programming language
str is now I love 
str is now I love linu
str is now I love linux an
str is now I love linux and th
str is now I love linux and the C 
str is now I love linux and the C 
str is now I love linux and the C 
str is now I love linux and the C 
str is now I love linux and the C 
str is now I love linux and the C 
str is now I love linux and the C 

I'm a newbie in C and would like to learn something by debugging this.

1 Like

realloc() is not guaranteed to reallocate in situ, which is why you do

str = realloc(str, mem); 

rather than

realloc(str, mem); 

The next_read variable should be reset with something like

next_read = str + strlen(str);

Basically you were writing into the old area of memory when str was now pointing to a new area.

Andrew

4 Likes

Andrew, thank you so much for your beautiful explanation. I understand the issue and can confirm that this works exactly as expected:

#include <stdio.h>
#include <string.h>
#include <stdlib.h>

#define SIZE 4

int main(void)
{
	int mem = SIZE;
	char *str = malloc(mem); // let's keep str pointing to beginning of string...
	char *next_read = str; // ...and next_read pointing to where next character should go

	fgets(next_read, mem, stdin);
	printf("str is now %s\n", str);

	while(str[strlen(str)-1] != '\n') // if we got whole string, the last char will be '\n'
	{
		mem += SIZE;
		str = realloc(str, mem); 
		next_read = str + strlen(str);
		fgets(next_read, SIZE+1, stdin); // read the rest (hopefully) of the line into the new space
		printf("str is now %s\n", str);
	}

	printf("final str is %s", str);
	// free(str);
	return 0;
}
bruno@thinkpad:~/Desktop444$ gcc getstring.c
bruno@thinkpad:~/Desktop445$ ./a.out 
I love linux and the C programming language
str is now I l
str is now I love 
str is now I love linu
str is now I love linux an
str is now I love linux and th
str is now I love linux and the C 
str is now I love linux and the C prog
str is now I love linux and the C programm
str is now I love linux and the C programming 
str is now I love linux and the C programming lang
str is now I love linux and the C programming language
str is now I love linux and the C programming language

final str is I love linux and the C programming language

It is strange that realloc() reallocates in situ at first, then not. Knowing that there are no guarantees was the key to the mystery. THANK YOU :slight_smile:

1 Like

The problem relates to memory management. The OS sets an "end point" and a "start point" for a process working set (memory) when the process begins.

There are flavors of the malloc (also realloc) routine, many based on Doug Lea's original malloc. His version calls brk() when it thinks more added memory will go beyond the bounds of the current memory. This brk() call will possibly change the end point of the process only when your malloc asks for more and it bumps heads with the end of the data segment or existing stack. So your string start and end may be moved

The size [executable name goes here] command shows what is going on.

Tutorial with great examples:

Memory Layout of C Programs - GeeksforGeeks

1 Like

Thank you, Jim. That's a great tutorial.

In case this helps other newbies, I moved the getstring function to a separate file containing custom functions. Also, I changed getstring's logic to go ahead and double the allocated memory each time more space is required, so that all the input can be parsed more quickly. When the entire string has been read, a final call to malloc shrinks the allocated memory so that it is just enough to hold the string--e.g., 13 bytes for "I love linux" (there's always one more byte needed than the number of characters because of the '\0' string terminator).

// filename: mylib.c

#include <stdio.h>
#include <string.h>
#include <stdlib.h>
#include "mylib.h"

#define SIZE 256

char *getstring(void)   // this function reads a string from command line, 
{                    // chops off the final \n, then shrinks allocated memory to exact bytes needed to hold the string
	int mem = SIZE;
	char *str = malloc(mem); // I'll keep str pointing to beginning of string...
	if (str == NULL)
		report_alloc_error();
	char *next_char = str; // ...and next_char pointing to where next character should go

	fgets(next_char, mem, stdin);

	while(str[strlen(str)-1] != '\n') // when we get whole string, last char will be '\n'
	{
		mem *= 2;
		str = realloc(str, mem); 
		if (str == NULL)
			report_alloc_error();
		next_char = str + strlen(str);
		fgets(next_char, mem/2 + 1, stdin);
	}

	// chop off trailing newline from string
	*(str + strlen(str) - 1) = '\0';

	// trim mem down to exact bytes needed to hold string
	mem = strlen(str) + 1;
	str = realloc(str, mem);

	// for debugging:
	//printf("final str is %s\n", str);
	//printf("final mem is %d\n", mem);
	
	return str;
}

void clean_stdin(void)
{
	int c;
	do 
	{
		c = getchar();
	} while (c != '\n' && c != EOF);
}

void report_alloc_error(void)
{
	printf("Memory allocation failed. Exiting.");
	exit(1);
}
// filename: mylib.h

#ifndef MYLIB_H_ // This guards against including this header more than once
#define MYLIB_H_

char *getstring(void);
void clean_stdin(void); 
void report_alloc_error(void);

#endif
// filename: example.c
// to compile: gcc -o example example.c mylib.c

#include <stdio.h>
#include <string.h>
#include <stdlib.h>
#include "mylib.h"

int main(void)
{
	char c;
	int i;
	float f;
	char *s;

	printf("Enter a character: ");
	scanf(" %c", &c); // the space tells scanf to ignore leading white space (if that's what you want).
	clean_stdin();	  // note that for format specifiers other than %c, scanf automatically ignores leading whitespace.

	printf("Enter an integer: ");
	scanf("%d", &i);
	clean_stdin();

	printf("Enter a float: ");
	scanf("%f", &f);
	clean_stdin();

	printf("Enter a string: ");
	s = getstring();

	printf("\nYour character: %c\n", c);
	printf("Your integer: %d\n", i);
	printf("Your float: %f\n", f);
	printf("Your string: %s\n", s);
	free(s);

	return 0;
}
$ gcc -o example example.c mylib.c
$ ./example
Enter a character: y some garbage 38.9
Enter an integer: 3 more garbage 89
Enter a float: 3.14159 7389junk
Enter a string: I love C programming, yes I do!

Your character: y
Your integer: 3
Your float: 3.141590
Your string: I love C programming, yes I do!
1 Like

I'd just like to add that a good way to use realloc is to assign to a temp variable rather then the variable you are copying from. That way if realloc fails you haven't lost the memory in your original variable. i.e.

char* tmp = realloc (str, len);
if (tmp == NULL) {
    free (str); // you can free str since you haven't changed the address in realloc
    return NULL; // or something to signal an allocation failure

} 

str = tmp; // the reallocation worked and now we assign tmp to str

-Greg.