gcc linker address

Hai,

I have two (Pgm1.c and Pgm2.c) simple programs, which is compiled using gcc. Now we have two exe's (Pgm1 and Pgm2). When i executed the nm Pgm1 and nm Pgm2, in the listed symbols the address of main is same for both programs (08048344 T main) at run time also.

Doubt:

1) What is this address (08048344 T main) ?.
2) If both the address are same how it is loaded into memory?
3) Whether gcc will generate the same address for all programs.

main's location is probably a coincidence from main being the first(or maybe even only?) function in your code. It doesn't need to be at anywhere in particular.

There is a tiny bit of code that's at a fixed location. When the executable loads, it doesn't call any functions -- it just jumps immediately to 0x08048000(for linux x86 anyway, won't speak for other architectures) and begins executing whatever's there. Which in 99.9% of cases is going to be a tiny bit of hand-crafted assembly language code. It does a little bit of setup before calling your main() function, then calls exit() after main returns.

$ readelf --syms /usr/lib/crt1.o

Symbol table '.symtab' contains 18 entries:
   Num:    Value  Size Type    Bind   Vis      Ndx Name
     0: 00000000     0 NOTYPE  LOCAL  DEFAULT  UND
     1: 00000000     0 SECTION LOCAL  DEFAULT    1
     2: 00000000     0 SECTION LOCAL  DEFAULT    2
     3: 00000000     0 SECTION LOCAL  DEFAULT    4
     4: 00000000     0 SECTION LOCAL  DEFAULT    5
     5: 00000000     0 SECTION LOCAL  DEFAULT    6
     6: 00000000     0 SECTION LOCAL  DEFAULT    7
     7: 00000000     0 SECTION LOCAL  DEFAULT    8
     8: 00000000     0 SECTION LOCAL  DEFAULT    9
     9: 00000000     4 OBJECT  GLOBAL DEFAULT    4 _fp_hw
    10: 00000000     0 NOTYPE  GLOBAL DEFAULT  UND __libc_csu_fini
    11: 00000000     0 FUNC    GLOBAL DEFAULT    2 _start
    12: 00000000     0 NOTYPE  GLOBAL DEFAULT  UND __libc_csu_init
    13: 00000000     0 NOTYPE  GLOBAL DEFAULT  UND main
    14: 00000000     0 NOTYPE  WEAK   DEFAULT    6 data_start
    15: 00000000     4 OBJECT  GLOBAL DEFAULT    5 _IO_stdin_used
    16: 00000000     0 NOTYPE  GLOBAL DEFAULT  UND __libc_start_main
    17: 00000000     0 NOTYPE  GLOBAL DEFAULT    6 __data_start
$

Note the 'UND main', the library doesn't actually supply a main, just demands that one exists. It also doesn't seem to actually define the start point here, that's decided later and probably adjustable. But by tradition is kept at 0x08048000 for linux x86. I don't know why it doesn't link _exit, maybe it's doing a raw system call.

As for why these things are always kept at the same place, that's the beauty of virtual memory. Instead of having to torture your code into running from any possible location and working around whatever holes are left in global memory, the memory layout itself is adjustable. Each process gets a private little universe with the same startup location and memory layout as everything else. Any x86 processor since the 80386 can do this memory translation in hardware. The kernel's left in charge of assigning what memory truly belongs to each process...

You may find A Whirlwind Tutorial on Creating Really Teensy ELF Executables for Linux an illuminating description of the really low-level inner workings of an executable.

2 Likes

I, however, really couldn't figure out this 0x08048000 location in the realelf output of your example???

Mine on a Linux machine, an executable gave me main's location as follows:

000000000040069a 82 FUNC GLOBAL DEFAULT 12 main

However, this is a 64 bit address; so do we assume that main, in any elf binary over 64-bit x86 architecture, would have an start address as 000000000040069a; where as in a 32-bi the same was at 0x08048000.

Shoul'd the same been 0x0000000008048000 over a 64-bit system????

I think that's because crt1.o is a library. That stuff gets decided when the final executable is linked and not before.

That's because:

The bit that ends up at the start location is called _start.

The start location might have been even less fixed than I thought, too. 0x08048000 is just where linux x86 begins loading code for x86, not where _start has to be.