A question regarding the gcc compiler ...

It might be a simple one but I have this question bothering me for sometime.

When we do a symbol search inside the library directory (i.e. /usr/lib/*) via tools like nm; it takes a while to give us the results. However, its very quick when gcc is invoked to compile a program with the very same symbol (i.e. usually function API names etc.) when we include the relevant headers. Its not always the case with known headers. I've observed gcc picking up the correct libraries when such libraries are copied manually into the /usr/lib/ directory and its corrosponding header copied inside the directory /usr/include/.

How is this possible that gcc is able to find symbols so quickly where as nm couldn't? Even if gcc uses a database or configurations, why is that not available to nm? Atleast both are open source standard system tools.

The short answer is that gcc builds symbol tables on the fly by parsing your code into required symbols (a "myfile".o file plus the crt1.o file (where _start() lives) ), then invoking ld against those symbols.

Try nm on a.o file.

nm has to read an ELF file, then resolve the symbols. Every entry point and variable (except those optimized away) constitutes something nm has to re-engineer. If you are on linux or have support ELF image files, then you can get the readelf utility. Try it.

Or download readelf.c and check out what nm has to deal with.

rpm: file/src/readelf.c Source File

If you ever build large database programs, example: Oracle Pro*C, you will find it takes a while to build one.

1 Like

Hey Jim, thanks for your reply, really appreciate for your time.

However Jim, I feel, somehow I'm still at the same place and that the answer itself was my query.

If gcc can build the symbol table on the fly; how its so fast in locating the correct library?

The nm utility also does the same thing and its very slow and takes a considerable time to return back with the libraries matching in the symbol being searched?

If, incase, the gcc has a much efficient algorithm, why is that nm is yet to implement the searching logic of gcc; atlease they both are around for almost same time and both are opensource tools.

So, you're trawling the entire /usr/lib/ directory? by default gcc uses these and only these:

# Compiling with -static to make it more obvious which things it uses
$ echo "main() { return(42); }" > 42.c
$ strace -f gcc -static 42.c -o 42 2> 42.log
$  grep "\.[ao]" 42.log | grep open | egrep -v "ENOENT|/tmp"
[pid  7741] open("/usr/lib/gcc/i686-pc-linux-gnu/4.3.4/../../../crt1.o", O_RDONLY|O_LARGEFILE) = 4
[pid  7741] open("/usr/lib/gcc/i686-pc-linux-gnu/4.3.4/../../../crti.o", O_RDONLY|O_LARGEFILE) = 5
[pid  7741] open("/usr/lib/gcc/i686-pc-linux-gnu/4.3.4/crtbeginT.o", O_RDONLY|O_LARGEFILE) = 6
[pid  7741] open("/usr/lib/gcc/i686-pc-linux-gnu/4.3.4/libgcc.a", O_RDONLY|O_LARGEFILE) = 8
[pid  7741] open("/usr/lib/gcc/i686-pc-linux-gnu/4.3.4/libgcc_eh.a", O_RDONLY|O_LARGEFILE) = 9
[pid  7741] open("/usr/lib/gcc/i686-pc-linux-gnu/4.3.4/../../../libc.a", O_RDONLY|O_LARGEFILE) = 10
[pid  7741] open("/usr/lib/gcc/i686-pc-linux-gnu/4.3.4/crtend.o", O_RDONLY|O_LARGEFILE) = 11
[pid  7741] open("/usr/lib/gcc/i686-pc-linux-gnu/4.3.4/../../../crtn.o", O_RDONLY|O_LARGEFILE) = 12

Try a library it'll have to look for, like pthread:

# Yes, I'm aware the code is nonsense.  Nothing better's needed
$ echo "main() { return(pthread_create(42)); }" > 42.c
$ gcc 42.c
/tmp/cckAe1dw.o: In function `main':
42.c:(.text+0x19): undefined reference to `pthread_create'
collect2: ld returned 1 exit status

...It didn't find it. gcc only looks inside what it's told to look inside -- by default a very small set of libraries encompassing stdio, libc, a few math builtins, that's it. It has to be explicitly told to check in libpthread.so (or libpthread.a when linking statically) like

$ gcc -lpthread 42.c
$

So gcc never had any special superfast linker finding thing. It's just been told where to look. When you start linking in dozens of libraries, you'll indeed find gcc having to do a lot more work...

1 Like

Yes -l is required with all the shared libraries (only) and that the gcc never looks into any *.so by default. We also do provide -L flag in the makefiles for gcc to look for but it's the directory path, wherein there could be many library archieves (i.e. *.a).

Wherein we also provide a path for nm to look into, isn't it?

There were times when I've used altogather a new APIs by just looking at header files declerations and searched the relevant *.a file path to be included into the makefile. gcc was extremely quick to locate the relevant libxxx.a file (out of a number of such archieves) to add the correct library without significantly impairing the overall build time.

I was suspecting to have some kind of global symbol maintainance job by gcc (global -for given a system) with an on the fly addition (or may be deletion when the Symbol-Path combination produces error) as and when a new symbol is encountered to be linked into with the build. Which won't be dependable all the time, as otherwise that would hamper the portability of the source , but wanted to confirm that :slight_smile: .

If you don't know the library you want, gcc doesn't either. -L only tells gcc where to get files from, it still expects names given with -l for it to link anything at all. It never trawls. Of course trawling with nm is slower.

It's still not psychic. It doesn't know that -lz means zlib, and doesn't know what symbols -lz is supposed to supply. It just knows that -lz means libz.a/libz.la/libz.so/etc. There is a 1:1 correlation between library names and library files, it doesn't have to hunt -- it already knows what it wants, which is precisely what you told it.

Meaning, if you can compile with gcc, you already know what exact file you want, which is of course more efficient than trawling hundreds of files hunting for a symbol.

Thank you very much for your replies. Many things, I got to know.

No problem. It also occurs to me the pkg-config command might be an effective way to find the libraries you want, if your system has it. It keeps a bunch of text files somewhere on your system for it to process and spit out for you on demand.

$ pkg-config --libs sdl
-lSDL -lpthread
$ gcc program-using-sdl.c $(pkg-config --libs --cflags sdl)
$

Some GNU/Linux distributions include the compiler cache ccache which significantly speeds up GCC operations under certain conditions. You may not be even aware that it is installed and operational.

Gentoo won't let you report bugs unless you disable ccache. It can cause problems. I'd be surprised if a distro used it by default.

pkg-config is there with my system; however what does this command mean?
The one in () braces?

It's using the output of pkg-config sdl --libs --cflags as parameters for gcc. In the first usage I show you how it works, in the second usage I show you how to feed those commandline parameters directly into gcc..