Dynamic memory allocation

Hi,

I am trying to process line by line of a file. But I should not be allocating static allocation for reading the contents of the file. The memory should be dynamically allocated. The confusion here is how do I determine the size of each line, put it into a buffer with the memory allocated with the size of the line?

Is there any optimised way to do this?

Thanks,
Anitha

size of a line means its length right??

Yes, the length of each line.

Suppose I have a file with the lines:
Today is tuesday.
Tomorrow is Wednesday.

I have to find the length of the first line which I can do by character by character reading till I reach a '\n' with a count of the number of characters but then I have to go back to the beginning of the line again to read the contents of the line and put in the buffer which is allocated the number I obtained by the first level of counting.

Why dont you construct a buffer of some size say 1024. Initialize the buffer contents with 0. As you read each character put that character into the buffer. Once the line is done, you have the length as well as the contents. Create your new memory with that length and do a memcpy (destination, source, length) to copy the contents.

this wont work for lines that are greater than size 1024

instead, parse through the line and then do a dynamic allocation and copy the contents, free that.

But throughput will suffer here as memory is allocated and deallocated each time for parsing a line. Instead of a big buffer size like 1KB or 2 KB ask Vino suggested can be safely used.

Maybe this will help?

Any compelling reason to go for a dynamic solution??
A static solution may not be optimized but a dynamic solution would be expensive in terms of mp units.

This is one of those problems that I've always enjoyed because most of the text processing utilities we use have statically defined or user defined limits in moving parts. (Think gawk, perl, etc...).
The Heathfield link is as good an interface as I've seen.
A quick (and easily broken) data structure and some logic for working with the idea.



typedef struct _line {
int len;
long lineno;
char *record;
struct _line *prv;
} LINE;

LINE *newallocline(int l, long rno, char *data, LINE *prv) {
LINE *new = NULL;

                    if ( (new = malloc(sizeof(new))) == NULL) {return NULL;}
                    new->record = malloc(l * sizeof(char));
                    if (new->record == NULL) {free(new); new = NULL; return NULL;}
                    strncpy(new->record,data,l);
                    new->lineno = rno;
                    new->prv = prv;
                    return new;
}                    
                                      
/*pseudo code
*prv = NULL;
* while (gets input into absurdly static large_buffer (500000 characters) for line)
        if ( (cnew = newallocline(strlen(large_buffer),cnt++,large_buffer,prv)) == NULL) {error();}
        prv = cnew;
}
*/

I have a programming test where it is mandatory that only dynamic allocation should be done for any file manipulations.

Is it some kind of classwork problem ?

Rules for these UNIX forums:

  • Don't ask about homework
  • Don't ask about homework
  • Don't ask about homework

Nonetheless, this is a really important question that should be answered, because, by golly, I was working on it today.

Actually, my problem is slightly different. I'm augmenting a shared-library routine which parses the command-line arguments. My task is to pass those arguments to sprintf(). How do I make sure:

  1. I have enough space allocated for all the arguments?
  2. I don't write past the end of memory?
  3. I don't run out of memory

Imagine a directory with 10,000 files in it. (This happens quite a lot in Bioinformatics and cluster computing). Then I do a "echo *". Let's say I'm augmenting the command "echo". I have to make sure it can handle 10,000 arguments, which means dynamically allocating a very long string.

One way is to statically allocate a large chunk of memory, and if it's not enough, report that the program is out of memory or that there are too many arguments and process what can be processed.

Another way is to dynamically allocate memory byte-by-byte (or in my case, argument-by-argument). You can do this easily enough with the realloc() call. A good realloc() implementation is very efficient, and actually reserves memory in pools, so that realloc'ing 1 more byte does nothing more than increment some internal counter somewhere. The realloc() call can be used on a NULL pointer in place of malloc(), so you can just it from the start.

If the realloc() call is inefficient, you can create one yourself by allocating memory in chunks. You allocate, say, 256 bytes at a time, and you have a counter of how long the buffer actually is and another counter tracking how much is being used. Every time you read in (or in my case, copy in) a new value (or argument), you check to see if you have enough space in the buffer. If you don't, you go out and allocate some more.

Allocating memory in 256 byte chunks may seem efficient until you move past userspace.
I'm smiling here so please don't be offended.

This is what I was trying to get at earlier. Most 'smart' text processing utilities/languages depend on the user to tell them: 'How many records should I read and what is a record', or 'How long is a line and how do I determine it?' either at compile time, or at runtime.

As most of us know from sad experience many standard utilities come with static
limitations.

This is a challenging problem and looking at the mailing lists for comp.lang.awk and
other text processing languages is always educational.