Hi,
I'm trying to learn how to manage memory when I have to deal with lots of data.
Basically I'm indexing a huge file (5GB, but it can be bigger), by creating tables that
holds offset <-> startOfSomeData information. Currently I'm mapping the whole file at
once (yep!) but of course the application quickly run outs of memory and malloc'ing
a new table fails after a bit.
My first question, which is more a request for confirmation, is the following.
Does the mapped file counts in memory usage? I'd say yes at 99,9% but I'd like
to be sure.
Second I'd like to know what syscall are available in order to retrieve memory
information of the calling process (how much memory used, how much left etc..) ?
I'm of course going to map a few pages at the time, although it'll be more tricky
to parse the file. Anyway, I'd like to know how I should deal with the tables I create and
I keep in memory. If I dump a few tables to a temporary file, and mmap it for quick access, it'd be the same thing if the answer to my first question is yes. I definitely
would not want the kernel to start swapping memory, but I'd rather have a thread
that concurrently writes those table to a file, for later retrieve.
Anyhow, my main concern is not to run out of memory (i.e. malloc has not to fail).
Any kind of suggestions for people with more expertise are very welcomed.
I'm eager to learn.
Thanks,
S.