memory stack problem

Hi, I am writing a C program under SCO Unix. I have a memory stack problem but do not know how to go about fixing it. I have tried running INSURE but that does not detect any problems.

Essentially the problem is that the memory address shifts on return from a routine. I pass a pointer to function "get_dsp_data" which passes a pointer to that pointer (AKA double pointer) to a database routine. In the db routine I fill in the double pointer structure after malloc'ing memory. The data is fine within the db routine. The data is fine in the "get_dsp_data" routine. However, when the data gets back to the first routine the memory location has shifted, see below.

*****************************************************
stbm.c 310 before: p_number_of_dsps is 0 and
&p_number_of_dsps is 2147481140
stbm.c 1052 In get_dsp_data, p_number_of_dsps is 16 and
&p_number_of_dsps is 2147481140

** memory shifts here but I don't know why or how **

stbm.c 312 after: p_number_of_dsps is 0 and
&p_number_of_dsps is 2147481034
*****************************************************

If I change things around so that the variable is a global then the program cores at the end of the last routine before exit. I am thinking that possibly memory is going past its bounds but I don't know how to verify or fix this.

One other bit of information, I ported my code over to LINUX and ran valgrind on it, but no problems were detected. But that could be because the memory did not shift when run on the LINUX system. Perhaps it is a difference in the way the compilers handle memory??

Please if anyone has any ideas for me on how to troubleshoot let me know.

Thanks, Jeanne

I think data adjacent to the pointer is overwriting the pointer in question - it's a so-called one-off error. One byte off - you are writing the LSB of a longword pointer.

The only way to fix this is to get into gdb, then examine the pointer after every line
of code is executed - just after you load the struct in get_dsp_data.

Memory problems are always a bugger to find, usually because the point of failure is not the fault, the fault lies elsewhere in your code.

One thing you can try - which is rather crude - is to comment out lines of code, recompile and run. If you continue to comment out lines of code sooner or later you will comment out the faultly line of code and the problem goes away.

In my own experience once you have identified the line of code causing your problem, a fix is not far behind.

This method has its limitations, of course, but it may help.

MBB

Here's what I use to catch accidental variable modifications from other functions:

#include <sys/types.h>
#include <sys/mman.h>
#include <unistd.h>
#include <stdio.h>
#include <stdlib.h>

void *
debug_alloc_pages(size_t nbytes) {
        long    psize = sysconf(_SC_PAGESIZE);
        size_t  npages = nbytes / psize;
        int             fd;
        void    *ret;

        if (npages * psize < nbytes) {
                ++npages;
        }

#ifdef MAP_ANON
        ret = mmap(0, npages * psize, PROT_READ|PROT_WRITE,
                MAP_ANON, -1, 0);
        if (ret == MAP_FAILED) {
                perror("mmap");
                exit(EXIT_FAILURE);
        }
#else
        /*
         * Use MAP_ANONYMOUS on HP-UX and mmap() with an 
         * fd for /dev/zero everywhere else
         */
        puts("debug_alloc_pages() does not work on this system");
        exit(EXIT_FAILURE);
#endif
        return ret;
}

Now, instead of writing

char buf[128];

... write

char *buf = debug_alloc_pages(128);

When you're done initializing ``buf'', do

(void) mprotect(buf, sysconf(_SC_PAGE_SIZE), PROT_READ);

In every function that is allowed to modify ``buf'', execute an

(void) mprotect(buf, sysconf(_SC_PAGE_SIZE), PROT_READ|PROT_WRITE);

... when you enter it and

(void) mprotect(buf, sysconf(_SC_PAGE_SIZE, PROT_READ);

... when you return from it.

An invalid write access should now yield a bus error or segmentation fault which will provide you with a core dump from which you can obtain a stack trace showing you which function attempted to modify the data.

Hope this helps