Knowing the size and location of variables in a C program

Cambria · July 17, 2013, 2:39pm

So I need some help with this. Pardon me if I'm posting in the wrong forum, after some googling for my answer and finding nothing I found this forum. It seemed appropriate for what I was seeking. I just didnt find a forum that concerned the use of GDB. I'm learning to use the C language and GDB. What I don't understand is how the computer knows how big each piece of a program is in memory, and how I could find my variable's in memory using GDB.

For example how does the computer know that the disassembled instructions from main() are <main+##>? Is there a flag between each variable in memory on the stack? Or does the CPU reference the text segment with the variable in memory to know where a variable begins and ends?

I mean if all memory is numbered how can anyone including the CPU know where a word or giant or w/e starts and ends?

If I wanted to find my variable in memory after setting a break point in it and accessing the $esp register how would I know where my variables began and ended?

When I use the examine command "x" I don't know how to know where my variable begins and ends. Would it be the $ESP register on the stack minus the word size of my variable? $EIP shows how many bytes from main and the previous instruction when you disassemble something but everything on the stack is just numbers.

Any help would be much appreciated!

Corona688 · July 17, 2013, 4:30pm

To get nice debugging information like that, you have to build the executable with debugging information(i.e. -ggdb). This embeds lots of offsets and labels inside the program file for gdb's convenience.

This is also why gdb has trouble when it steps into code outside your program, like libc... Libraries are probably not built with debugging information, so details about their insides will be very limited.

To put it bluntly -- it doesn't. They all become hardcoded segment offsets, in the end. Without debugging information, you're left with detective work.

If your executable wasn't built with debugging info, that'd mean detective work.

DGPickett · July 17, 2013, 4:44pm

Wow, many questions. More magic than mechanism, it turns out. The sizeof every variable is pretty predictable, and the packing can be discovered from the offsets of pointers to variables. Lets say you give main an int variable, automatic. I just puls down the stack pointer 4 places (stacks usually grow down from FFFFFFFFF or whatever) and calls it that int. If it was static, it remembers that the current heap pointer is that int, and raises it 4 places. gdb finds structures for linking that identify most variables, and clues for debugging left by the compiler, if not stripped. This also contains pointers to linkable subroutines like main(). There is no punctuation in modern computers, they go by count/size. The cpu does not know where variables begin and end, which allows you to get SEGV faults sometimes when you overreach, unless you just get/write adjacent data of yours. C is like an assembly language for a computer that does not exist but is close enough for everyone to adapt to. Stack and heap pointers might be in registers, or in memory outside the CPU, no matter as long as the compiler knows.

Some computers do not do words that are not aligned, but the x86 lets wrds float free. For speed it helps to realign them to modulo 2, 4, 8 or whatever so one RAM fetch does the trick. Compilers often pack things with padding so they hit boundaries. It may take extra processing to compute with a misaligned word.

Some machines are big-endian, meaning the big byte goes in the low character: x86 is little-endian, SPARC and the IP protocol are big-endian, but some SPARC can change to little-endian (a slow process, I was told), perhaps to emulate an x86. Keep this in mind when interpreting the stack or heap.

Routines are relocatable, so the loader can place main wherever it wants and call it. Part of run time linking is giving the code the right pointers to actual code and data. The stack automatic addressing is relative, so it can be faster and simpler. The usual model is that all the code goes at the bottom, then the constants, and finally the initialized and not initialized heap variables, but dynamic loading of libraries may layer in more code, constants and variables. Memory is usualy virtual, and often to segregate code from data they go on different pages with different flags, so code is not writable and data is not callable.

Data Break points are managed by running the code a bit at a time and watching the location. Code break points are done by substituting call code at the breakpoint, saving aside the original code. Ditto for stepping.

I like to use where to examine the stack for calls. Mostly I do not use GDB, I use code with careful formatting, structure, error checking and logging. Sometimes I add debug printouts to narrow a problem. Sometimes I use tusc/truss/strace to trace the running process (very educational about UNIX). Usually it debugs very quickly.

I do use GDB to find out where a core died. I have even written cron scripts to pick up core files, gdb them, send mail, compress them and stash them in /tmp so new core files can be detected. You never know how many core dumps happen in prod if you do not look!

So, as I said, not much mechanism, lots of smarts about how things work. Compilers may also have calls mark the stack so it is easy to 'where'. Stacks may have a mix of hardware CPU laid out data and programmer automatic variables, but in some systems they have two stacks, one for the automatic stuff and one for the CPU defined stuff, as if the CPU starts loading registers with automatic data, anything can happen, usually a fault on the process, which passes the CPU to a signal handler. The amount of stuff on the stack can vary a lot, depending on whether variables are passed in registers, whether the stack frame is for a more significant change, not intra-thread but inter-thread or inter-process (like a disk controller interrupt). Sometimes bits in registers or in the call itself control how much CPU data goes into the stack for a call. Good luck reading the stack barefoot. Mostly, just remember that if an automatic is overwritten, look for an array declared later being written past the end, or an array declared before being written past the beginning. Hackers make a great living off of finding programs that do not limit how much they read, and pass them carefully structured too much. So, never use gets(), use scanf() with great care, make do with fgets(), getc(), fread() when possible (in the FILE* world). Less to debug!