How to make use others' C library installed not for the system-wide (Ubuntu/Linux)?

I have downloaded and installed a library called htslib for specific bioinformatic use but not for the system (I'm using Ubuntu 18.04). Only parts of the library is needed for my exercise to parse data in a type called VCF format (basically tab-delimited file but contains many information in columns).

  1. My test code (vcf_parser01.c) is attached to make use the VCF API from the htslib, for which the vcf part seems to be what I need and the vcf.h/vcf.c files are also attached here. Maybe the whole htslib is needed, that's one of the parts I need to confirm.
  2. The htslib was compiled and installed locally without error through Makefile with the package, and the structure of the resulted folder is like:
.
├── aclocal.m4
├── bgzf.c
├── bgzf.o
├── bgzip.c
├── bgzip.o
├── config.h
......
|── cram
│   ├── cram_codecs.c
│   ├── cram_codecs.h
│   ├── cram_codecs.o
......
├── header.c
├── header.h
├── header.o
......
├── hts.c
├── htsfile.c
├── htsfile.o
├── hts_internal.h
├── htslib
│   ├── faidx.h
│   ├── hfile.h
│   ├── hts_defs.h
│   ├── hts_endian.h
│   ├── hts.h
......

├── kstring.c
├── kstring.o
├── vcf.c
├── vcf.o
......

 

3) To compile my test code, I used:

gcc -Wall -O3 -I ./htslib-1.10.2/htslib  -o vcf_parser01 vcf.c  vcf_parser01.c

4) There are many errors believed to the linker related such as:

vcf_parser01.c:(.text.startup+0x19): undefined reference to `hts_open'
vcf_parser01.c:(.text.startup+0x73): undefined reference to `hts_close'
/tmp/cc05JReL.o: In function `bcf_hdr_add_sample_len':
vcf.c:(.text+0x989): undefined reference to `hts_log'
/tmp/cc05JReL.o: In function `bcf_hdr_set_idx.isra.10':
vcf.c:(.text+0x116b): undefined reference to `hts_log'
vcf.c:(.text+0x11d5): undefined reference to `hts_resize_array_'
/tmp/cc05JReL.o: In function `bcf_subset_format.part.19':
vcf.c:(.text+0x18e5): undefined reference to `hts_realloc_or_die'
/tmp/cc05JReL.o: In function `bcf_hdr_format.constprop.30':
vcf.c:(.text+0x1e35): undefined reference to `ksprintf'
vcf.c:(.text+0x1e73): undefined reference to `ksprintf'
vcf.c:(.text+0x1ed3): undefined reference to `ksprintf'

There were totally 196 "undefined reference to" lines for 39 functions, which seems involving other libraries!
My question is:
How to debug this linker problem?

I think I understand the .o .so .a files, and the -I/-L options for gcc but not in full good catch.
A bigger picture in my mind is to learn how to make use of any non-standard C libraries from others, especially when documentation of the API is not very clear. I narrow down the topic in ANSI/GNU C in Linux platform only.
I may need a full course on this, but I know there are C experts in the forum. Really appreciate anybody could help me out.

First, I suggest when troubleshooting, you should:

Use the full path name to file, not relative path name. This will insure there are no strange, unseen PATH issues. For example:

gcc -Wall -O3 -I ./htslib-1.10.2/htslib  -o vcf_parser01 vcf.c  vcf_parser01.c

I would change this to:

gcc -Wall -O3 -I /FULL/PATH/TO/HERE/htslib-1.10.2/htslib  -o vcf_parser01 vcf.c  vcf_parser01.c

Second, you should confirm that these objects are in the PATH, exactly, and that you have read / access permission for them.

Sometimes, even the best sys admins install code under one userid, then they work as another userid, and they do not have permissions to access the file. Happens all the time (at least to me, LOL).

This is generally the first two steps I always take (back to basics, before back to the future).

  • Insure your PATH(s) are correct and objects / symbols are in the PATH(s), correctly.
  • Check file and directly permissions for the userid you are using to build.

UNIX and Linux are funny things, they generally do what they are told to do and report back the "facts" as they find them. In your case, gcc cannot find required symbols and objects. This is generally because they cannot find them, i.e. not in the search PATH or the file/directory permissions are "not as required".

Please post back and let me know if you are certain your PATHs are correct and why, using the FULL PATH names to files and directories in your command line for gcc .

Since we are not "standing behind you, watching you type", we cannot "see" what directory you are in, so to be sure, it is always best to use FULL PATH names when troubleshooting a problem like this.

Thanks.

1 Like

Undefined references means the compiler can't find compiled code for functions being called. If you want to use a subset of the library then you'll have to compile and link all the files used by the code you do want.
In otherwords - if you want to use method a in file a.c and it uses method b in b.c and it uses method c in c.c then you'll need to compile and link a.c, b.c and c,c. I you have an archive file you can link directly to it and the library won't be dynamically loaded at runtime. if you only have a shared library then you, if you are on a Linux system you can investigate rpath as a means to tell the linker where the library is to be found.

-Greg.

1 Like

Thanks Neo and Greg!

$ pwd
/home/yifangt/Study/C/VCF
$ ls ${PWD}
config.h  for_post  hts_internal.h  htslib  htslib-1.10.2  htslib-1.10.2.tar.bz2  README  textutils_internal.h  tmp.err  vcf.c  vcf.o  vcf_parser01.c
$ gcc -Wall -O3 -I /home/yifangt/Study/C/VCF/htslib-1.10.2  -o vcf_parser01 vcf.c vcf_parser01.c  2>err.txt

The htslib-1.10.2.tar.bz2 is what I downloaded from the website, and the library was compiled from the pkg according to their Makefile.
By the way I tried -L option instead of -I for the gcc command line gave the same error.
The complete actual log file is attached here too for your reference.
Thank you very much again!

You can compile and link using compiler options. The library is referenced using -l<its_name_here> and the path to the library using -L/path/to/lib.

-I is for the path to include directories.

However, if the library isn't in a directory in the linkers path when you run the executable you'll get an error. One way to solve this is to use rpath option.

someting like:

gcc -Wall -O3 -o vcf_parser01 vcf.c vcf_parser01.c -Wl,-rpath,/home/yifangt/Study/C/VCF -L/home/yifangt/Study/C/VCF -lhtslib-1.10.2  2>err.txt
 

Note: I'm guessing what you need and what each item is. -I should be used with the path to header -l prepended to the library name and -L preceding the path to the library.

Thanks!

Does the -L option ensure the sub-directories are searched recursively?

The -lhtslib-1.10.2 option of your command line may be wrong in this case as htslib-1.10.2 is a folder under which there are several sub-folders and many *.c *.h *.o *.pico files, but I understand your point. Two files: libhts.so and libhts.a may be the ones I need(permission is checked to be correct). So I tried:

$ gcc -Wall -O3 -o vcf_parser01 vcf.c vcf_parser01.c  -Wl,-rpath,/home/yifangt/Study/C/VCF  -L/home/yifangt/Study/C/VCF/htslib-1.10.2 -lhts 

  /tmp/ccSYd2K4.o: In function `bcf_hdr_set_idx.isra.10':
vcf.c:(.text+0x11d5): undefined reference to `hts_resize_array_'
/tmp/ccSYd2K4.o: In function `vcf_hdr_read':
vcf.c:(.text+0x66ad): undefined reference to `tbx_index_load3'
/tmp/ccSYd2K4.o: In function `vcf_write':
vcf.c:(.text+0xd57a): undefined reference to `hts_idx_tbi_name'
/tmp/ccSYd2K4.o: In function `bcf_index_load3':
vcf.c:(.text+0xdd55): undefined reference to `hts_idx_load3'
/tmp/ccSYd2K4.o: In function `bcf_idx_save':
vcf.c:(.text+0xe181): undefined reference to `sam_idx_save'
collect2: error: ld returned 1 exit status

Under the htslib-1.10.2 folder, there are three sub folders cram/, htslib/ and m4/ that may be involved.

In htslib/ sub-folder, *.h files only, so that's for the headers;
In cram/ and m4/ sub-folders, there are *.c, *.h and *.o files (and some other types, which seems for portability, and may not be needed, very likely to me!);
Also I have tried archive the *.o files into library in the format as mylibx.a or mylibx.so files, the error stayed the same. I have attached the link to download the library and my src code. Could it be possible for you to download the library to have a look at the code? That may be easier than what I explained.
Thanks a lot.

No. Each folder requires a -L entry.

I presumed htslib-1.10-2 was your library. The -l option is for the library itself. If the library is libhts.so then use -lhts

I'll have a look.

--- Post updated at 09:38 PM ---

Ok. I didn't see a link but grabbed it from github.

If you use the archive its simplest. I'm going to call the root path of the directory where you build the library LIBROOT. You should replace this with the actual path.

gcc -Wall -O3 -o vcf_parser01 vcf.c vcf_parser01.c -I LIBROOT/htslib LIBROOT/libhts.a

should work.

1 Like

The compiling worked after:

$ LIBROOT="/home/yifangt/Study/C/VCF/htslib-1.10.2"
$ gcc -Wall -O3 -o vcf_parser01 vcf_parser01.c vcf.c  -I ${LIBROOT}/htslib -I ${LIBROOT} -L ${LIBROOT}  -lhts
$ ./vcf_parser01: error while loading shared libraries: libhts.so.3: cannot open shared object file: No such file or directory
$ ldd ./vcf_parser01
    linux-vdso.so.1 (0x00007ffdde13d000)
    libhts.so.3 => not found
    libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007fa2c16bd000)
    /lib64/ld-linux-x86-64.so.2 (0x00007fa2c1cb0000) 

It seems I'm getting closer to the point now. The shared library libhts.so.3 is located in ${LIBROOT}.
I did not mean to be lazy without digging the ld_config(?) or related subjects, but simply to stay consistent with this same example to avoid confusion.
How to ensure the dynamic library is available for the new executable vcf_parser01 just compiled?

Thanks!

If you link to a shared library intead of the archive it's a little more complicted. If the shared library isn't installed in a directory know to the loader then you need to tell it where to find it. The rpath option I mentioned is passed to the linker and added to the executable. As long as the shared library is in the same place it will find it. You can also set LD_LIBRARY_PATH to the directory where it is. One way to do this (in a bash shell) is:

export LD_LIBRARY_PATH=/home/yifangt/Study/C/VCF/htslib-1.10.2; ./vcf_parser01

If you link to a shared library intead of the archive it's a little more complicated.
Thank you for point that out. Actually I meant the archive here.
I happened to found this article using a special variable $ORIGIN, and that seems to make everything working now.

$ LIBROOT="/home/yifangt/Study/C/VCF/htslib-1.10.2" 
$ gcc -Wall -O3 -o vcf_parser01 vcf_parser01.c vcf.c  -I ${LIBROOT}/htslib -I ${LIBROOT} -L ${LIBROOT} -Wl,-rpath,"\$ORIGIN"  -lhts
$ ./vcf_parser02 
Usage: print-ctg <in.vcf>

But, I could not find the reference in the GCC manual about this variable, or the -rpath. I am not sure if this is the correct way to do the job.
Especially, what's the standard way?
Thank you very much again.

rpath is actually a linker option rather then a compiler option. gcc passes it to ld.

Well, th estandard way to use shared objects (.so) is to install them in the system directories used for that purpose (/usr/lib etc.) but that's not always possible. The archive (.a) allows you to link objects into your executable which produces a larger exectable but gets around the problem. If the library is updated you must recompile whereas with a shared object you only need to recompile if the update breaks your code.

For the loader to load a shared object at runtime it must know where to find them. rpath tells the linker to embed the path in the header of the executable but if the so is moved the loader won't be able to load it. ORIGIN resolves to whereeve the binary is at runtime so as long as the library stays with the executable it works.

-Greg.

1 Like

Just because a liib is updated ... you do not have to recompile at all if your static binary is working fine.

In addition, shared libs can introduce major IT security issues.

There is "No Free Lunch"

In engineering and design, every thing is a kind of "trade off" ... never fall into the trap of "falling in love" with a "one half of the trade.".

Every Rose Has Its Thorns.

Thanks Greg, and Neo---although your input is more than I expected.
I found the topic deviated from the original point I was asking, because there are many aspects hidden behind my question that I do not know. However, It is easier for me to learn by example, especially on coding.
Normally, I found there is one layer missing to me when others library is installed in the system. Simply sudo apt-get install / sudo yum install etc does not help me in actual C coding from scratch.

Here, I want to stick to the tech and only the tech part, i.e: "In my this example, what is the correct/standard way to make use the downloaded htslib not installed system-wide? "

1) I want to confirm the options of my command line that are correct in general, while I'm trying to find the official reference (which normally do not have real hands-on example, e.g $ORIGIN);

2) Trying to catch the standard way to use non-system *.a and *.so files in C coding on top of point 1). Here "standard way" can be replaced with "common way" if there is not all-for-one solution, which is normally the case.
Two examples, (1) the standard way to use shared objects (.so) is to install them in the system directories Thanks Greg! This cleared my confusion. The not-answered part is how to make use of the *.so file when it is not installed system wide. (2) I saw people usually use static library for *.a files and dynamic library for *.so files, but Greg consistently use archives for *.a files and shared objects for *.so ones. It seems to me that they are different names for the same thing, but I might be wrong.
This was a so big confusing wording! And I owe an apology to all who read this post!
3) My point is what the right way(s)---may not be the best way--- is to use them.
Can I ask in another way:
What is the best practice to use others library (static *.a and shared *.so) not installed system-wide in C programming? I may need to start a new thread before the topic is veered too far off.

Thank you so much for your time!

No, they are not at all the same thing. An archive is just that. A collection of object files that can be statically linked to your executable. Shared objects are dynamically linked at runtime. They are compiled with a flag that tells the compiler to generate position independent code. Here's what gcc docs have to say:

-fpic   Generate position-independent code (PIC) suitable for use in a shared library, if supported for the target machine.  Such code accesses all constant addresses through a global offset table (GOT).  The dynamic loader resolves the GOT entries when the program starts (the dynamic loader is not part of GCC; it is part of the operating system).  If the GOT size for the linked executable exceeds a machine-specific maximum size, you get an error message from the linker indicating that -fpic does not work; in that case, recompile with -fPIC instead.  (These maximums are 8k on the SPARC, 28k on AArch64 and 32k on the m68k and RS/6000.  The x86 has no such limit.) 

Since archives are statically linked to your your code, it is no longer dependent on the object. With shared objects you remain dependent on the library.

I meant static library and archive are different names for *.a files, but not clear the options to the *.a files and *.so files in the command line for my sample code.
I'm so sorry for the confusing wording, I'd better open another thread with a new example.
Thank you so much for your discussion!