Fixes

Shared Library Issues In Linux

Video

Audio

Download

Quick Start

If you just want enough information to fix your problem quickly, you can read the How-To section of this post and skip the rest. I would highly recommend reading everything though, as a good understanding of the concepts and commands outlined here will serve you well in the future. We also have Video and Audio included with this post that may be a good quick reference for you. Don’t forget that the man and info pages of your Linux/Unix system can be an invaluable resource as well when you’re trying to solve problems.

Preface

To make things easier on you, all of the black command line and script areas are set up so that you can copy the text from them. This does make using the commands easier, but if you’re not already familiar with the concepts presented here, typing the commands yourself and working through why you’re typing them will help you learn more. If you hit problems along the way, take a look at the Troubleshooting section near the end of this post for help.

There are formatting conventions that are used throughout this post that you should be aware of. The following is a list outlining the color and font formats used.

Command Name or Directory Path
Warning or Error
Command Line Snippet With Commands/Options/Arguments
Command Options and Their Arguments Only
Hyperlink

Where listings on command options are made available, anything with square brackets around it (“[" and "]“) is an argument to the option, and a pipe (“|“) means that you can choose one of two alternatives ([4|6] means choose 4 or 6).

Overview

This post is geared more toward system administrators than software developers, but anyone can make good use of the information that you’re going to see here. The Resources section holds links to take your study further, even into the developer realm. I’m going to start off by giving you a brief background on shared libraries and some of the rules that apply to their use. Listing 1 shows an example of an error you might see after installing PostgreSQL via a bin installer file. In this post, I’m going to step through some commands and techniques to help you deal with this type of shared library problem. I’ll also work through resolving the error in Listing 1 as an example, and give you some tips and tricks as well as items to help you if you get stuck.

Listing 1

$ ./psql ./psql: error while loading shared libraries: libpq.so.5: cannot open shared object file: No such file or directory

Background

Shared libraries are one of the many strong design features of Linux, but can lead to headaches for inexperienced users, and even experienced users in certain situations. Shared libraries allow software developers keep the size of their application storage and memory footprints down by using common code from a single location. The glibc library is a good example of this. There are two standardized locations for shared libraries on a Linux system, and these are the /lib and /usr/lib directories. On some distributions /usr/local/lib is included, but check the documentation for your specific distribution to be sure. These are not the only locations that you can use for libraries though, and I’ll talk about how to use other library directories later. According to the Filesystem Hierarchy Standard (FHS), /lib is for shared libraries and kernel modules that are required for startup and running in the root filesystem (/bin and /sbin), and /usr/lib holds most of the internal libraries that are not meant to be executed directly by users or shell scripts. The /usr/local/lib directory is not defined in the latest version of the FHS, but if it exists on a distribution it normally holds libraries that aren’t a part of the standard distribution, including libraries that the system administrator has compiled/installed after the initial setup. There are some other directories like /lib/security that holds PAM modules, but for our discussion we’ll focus on /lib and /usr/lib.

The counterpart to the dynamically linked (shared) library is the statically linked library. Whereas dynamically linked libraries are loaded and used as they are needed by the applications, statically linked libraries are either built into, or closely associated with a program at the time it is compiled. A couple of the situations where static libraries are used is when you’re trying to work around an odd/outdated library dependency, or when you’re building a self-contained rescue system. Static linking typically makes the resulting application faster and more portable, but increases the size (and thus the memory and storage footprint) of the binary. There is also a multiplication of the size of a static library’s footprint if more than one program uses it. For instance, one program using a library that is 10 MB in size just consumes 10 MB of memory (1 program x 10 MB), but if you run 10 programs with the same library compiled into them, you end up with 100 MB of memory consumed (10 programs x 10 MB). Also, when programs are statically linked, they can’t take advantage of updates made to the libraries that they depend on. They are locked into whatever version of the library they were compiled with. Programs that depend on dynamically linked libraries refer to a specific file on the Linux file system, and so when that file is updated, the program can automatically take advantage of the new features and fixes the next time it loads.

Shared libraries typically have the extension .so which stands for Shared Object. Library file names are followed by a version numbering scheme which can include major and minor version numbers. A system of symbolic links are used to point the majority of programs to the latest and greatest library version, while still allowing a minority of programs to use older libraries. Listing 2 shows output that I modified to illustrate this point.

Listing 2

$ ls -l | grep libread lrwxrwxrwx 1 root root 18 2010-03-03 11:11 libreadline.so.5 -> libreadline.so.5.2 -rw-r--r-- 1 root root 217188 2009-08-24 19:10 libreadline.so.5.2 lrwxrwxrwx 1 root root 18 2010-02-02 09:34 libreadline.so.6 -> libreadline.so.6.0 -rw-r--r-- 1 root root 225364 2009-09-23 08:16 libreadline.so.6.0

You can see in the output that there are two versions of libreadline installed side-by-side (5.2 and 6.0). The version numbers are in the form major.minor, so 5 and 6 are major version numbers, with 2 and 0 being minor version numbers. You can usually mix and match libraries with the same major version number and differing minor numbers, but it can be a bad idea to use libraries with different major numbers in place of one another. Major version number changes usually represent significant changes to the interface of the library, which are incompatible with previous versions. Minor version numbers are only changed when an update such as a bug fix is added without significantly changing how the library interacts with the outside world. Another thing that you’ll notice in Listing 2 is that there are links created from libreadline.so.5 to libreadline.so.5.2 and from libreadline.so.6 to libreadline.so.6.0. This is so that programs that depend on the 5 or 6 series of the libraries don’t have to figure out where the newest version of the library is. If an application works with major version 6 of the library, it doesn’t care if it grabs 6.0, 6.5, or 6.9 as long as it’s compatible, so it just looks at the base name of the library and takes whatever that’s linked to. There are also a couple of other situations that you’re likely to encounter with this linking scheme. The first is that you may see a link file name containing no version numbers (libreadline.so) that points to the actual library file (libreadline.so.6.0). Also, even though I said that libraries with different major version numbers are risky to mix, there are situations where you will see an earlier major version number (libreadline.so.5) linked to a newer version number of the library (libreadline.so.6.0). This should only happen when your distribution maintainers or system administrators have made sure that nothing will break by doing this. Listing 3 shows an example of the first situation.

Listing 3

$ ls -l | grep ".*so " lrwxrwxrwx 1 root root 18 2010-02-02 14:48 libdbus-1.so -> libdbus-1.so.3.4.0

All things considered, the shared library methodology and numbering scheme do a good job of ensuring that your software can maintain a smaller footprint, make use to the latest and greatest library versions, and still have backwards compatibility with older libraries when needed. With this said, the shared library model isn’t perfect. There are some disadvantages to using them, but those disadvantages are typically considered to be outweighed by the benefits. One of the disadvantages is that shared libraries can slow the load time of a program. This is only a problem the first time that the library is loaded though. After that, the library is in memory and other applications that are launched won’t have to reload it. One of the most potentially dangerous drawbacks of shared libraries is that they can create a central point of failure for you system. If there is a library that a large set of your programs rely on and it gets corrupted, deleted, over-written, etc, all of those programs are probably going to break. If any of those programs that were just taken down are needed to boot your Linux system, you’ll be dead in the water and in need of a rescue CD.

While I would argue that dependency chains are not really a “problem”, they can work hardships on a system administrator. A dependency chain happens when one library depends on another library, then that one depends on another, and another, and so on. When dealing with a dependency chain, you may have satisfied all of the first level dependencies, but your program still won’t run. You have to go through and check each library in turn for a dependency chain, and then follow that chain all the way through, filling in the missing dependencies as you go.

One final problem with shared libraries that I’ll mention again is version compatibility issues. You can end up with a situation where two different applications require different versions of the same library – that aren’t compatible. That is the reason for the version numbering system that I talked about above, and robust package management systems have helped ease shared library problems from the user’s perspective, but they still exist in certain situations. Any time that you compile and/or install an application/library yourself on your Linux system, you have to keep an eye out for problems since you don’t have the benefit of a package manager ensuring library compatibility.

Introducing ld-linux.so

ld-linux.so (or ld.so for older a.out binaries) is itself a library, and is responsible for managing the loading of shared libraries in Linux. For the purposes of this post, we’ll be working with ld-linux.so, and if you need or want to learn more about the older style a.out loading/linking, have a look at the Resources section. The ld-linux.so library reads the /etc/ld.so.cache file which is a non-human readable file that is updated when you run the ldconfig command. The way that shared libraries are loaded is that ld-linux.so checks to see what paths to look for the libraries in by checking the value of the LD_LIBRARY_PATH environment variable, then the contents of the /etc/ld.so.cache file, and finally the default path of /lib followed by the /usr/lib directory.

The LD_LIBRARY_PATH environment variable is a colon separated list that preempts all of the other library paths in the ldconfig search order. This means that you can use it to temporarily alter library paths when you’re trying to test a new library before rolling it out to the entire system, or to work around problems. This variable is typically not set by default on Linux distributions, and should not be used as a permanent fix. Use it with care, and preference should be given to the other library search path configuration methods. A handy thing about the LD_LIBRARY_PATH variable is that since it’s an environment variable, you can set it on the same line as a command and the new value will only effect the command, and not the parent environment. So, you would issue a command line like LD_LIBRARY_PATH="/home/user/lib" ./program to run program and force it to use the experimental shared libraries in /home/user/lib in preference to any others on the system. The shell that you run program in never sees the change to LD_LIBRARY_PATH. Of course you can also use the export command to set this variable, but be careful because doing this will affect your entire system. One final thing about the LD_LIBRARY_PATH variable is that you don’t have to run ldconfig after changing it. The changes take effect immediately, unlike changes to /lib, /usr/lib, and /etc/ld.so.conf. I’ll explain more about ldconfig later.

You can use the ld-linux.so library by itself to list which libraries a program depends on. It’s behavior is very much like the ldd command that we’ll talk about next because ldd is actually a wrapper script that adds more sophisticated behavior to ld-linux.so. In most cases ldd should be your preferred command for listing required shared libraries. In order to use ld-linux.so.2 to get a listing of the depended upon libraries for the ls command, you would type /lib/ld-linux.so.2 --list /bin/ls swapping the 2 out for whatever major version of the library that your system is running. I’ve shown some of the command line options for ld-linux.so in Listing 4.

Listing 4

--list Lists all library dependencies for the executable --verify Verifies that the program is dynamically linked and that the ld-linux.so linker can handle it --library-path [PATH] Overrides the LD_LIBRARY_PATH environment variable and uses PATH instead

You can start a program directly with ld-linux.so by using the following command line form /lib/ld-linux.so.2 --library-path FULL_LIBRARY_PATH FULL_EXECUTABLE_PATH , where you replace 2 with whatever version of the library you are using. An example would be /lib/ld-linux.so.2 --library-path /home/user/lib /home/bin/program which would run program using /home/user/lib as the location to look for required libraries. This should be used for testing purposes only, and not for a permanent fix on a production system though.

Introducing ldd

The name of the ldd command comes from its function, which is to “List Dynamic Dependencies”. As mentioned in the previous section, by default the ldd command gives you the same output as issuing the command line /lib/ld-linux.so.2 --list FULL_EXECUTABLE_PATH. Each library entry in the output includes a hexadecimal number which is the load address of the library, and can change from run to run. Chances are that system administrators will never even need to know what this value is, but I’ve mentioned it here because some people may be curious. Listing 5 shows a few of the options for ldd that I use the most.

Listing 5

-d --data-relocs Perform data relocations and report any missing objects -r --function-relocs Perform relocations for both data objects and functions, and report any missing objects or functions -u --unused Print unused direct dependencies -v --verbose Print all information, including e.g. symbol versioning information

Keep in mind that you have to give ldd the full path to the binary/executable for it to work. The only way to work around giving ldd the full path is to use cd to change into the directory where the binary is. Otherwise you get an error like ldd: ./ls: No such file or directory. The only time that you would need to run ldd with root privileges would be if the binary has restrictive permissions placed on it.

As I mentioned in the Background section, you need to be aware of dependency chains when using shared libraries. Just because you’ve run the ldd command on an executable and satisfied all of it’s top level dependencies doesn’t mean that there aren’t more dependencies lurking underneath. If your program still won’t run, you should check each of the top level libraries to see if any of them have their own library dependencies that are unmet. You continue that process, running ldd on each library in each layer until you’ve satisfied all of the dependencies.

Introducing ldconfig

Any time that you make changes to the installed libraries on your system, you’ll want to run the ldconfig command with root privileges to update your library cache. ldconfig rebuilds the /etc/ld.so.cache file of currently installed libraries based on what it first finds in the directories listed in the /etc/ld.so.conf file, and then in the /lib and /usr/lib directories. The /etc/ld.so.cache file is formatted in binary by ldconfig and so it’s not designed to be human readable, and should not be edited by hand. Formatting the ld.so.cache file in this way makes it more efficient for the system to retrieve the information. The ld.so.conf file may include a directive that reads include /etc/ld.so.conf.d/*.conf that tells ldconfig to check the ld.so.conf.d directory for additional configuration files. This allows the easy addition of configuration files to load third-party shared libraries such as those for MySQL. On some distributions, this include directive may be the only line you find in the ld.so.conf file.

You often need to run ldconfig manually because a Linux system cannot always know when you have made changes to the currently installed libraries. Many package management systems run ldconfig as part of the installation process, but if you compile and/or install a library without using the package management system, the system software may not know that there is a new library present. The same applies when you remove a shared library.

Listing 6 holds several options for the ldconfig command. This is by no means an exhaustive list, so be sure to check the man page for more information.

Listing 6

-C [file] Specifies an alternate cache file other than ld.so.cache -f [file] Specifies an alternate configuration file other than ld.so.conf -n Rebuilds the cache using only directories specified on the command line, skipping the standard directories and ld.so.conf -N Only updates the symbolic links to libraries, skipping the cache rebuilding step -p --print-cache Lists the shared library cache, but needs to be piped to the less command because of the amount of output -v --verbose Gives output information about version numbers, links created, and directories scanned -X Opposite of -N, it rebuilds the library cache and skips updating the links to the libraries

ldconfig is not the only method used to rebuild the library cache. Gentoo handles this task in a slightly different way, which I’ll talk about next.

Introducing env-update

Gentoo takes a slightly different path to updating the cache of installed libraries which includes the use of the env-update script. env-update reads library path configuration files from the /etc/env.d directory in much the same way that ldconfig reads files from /etc/ld.so.conf.d via the ld.so.conf include directive. env-update then creates a set of files within /etc including ld.so.conf . After this, env-update runs ldconfig so that it reloads the cache of libraries into the /etc/ld.so.cache file.

How-To

Hopefully by the point you’re reading this section you either have, or are beginning to get a pretty good understanding of the commands used when dealing with shared libraries. Now I’m going to take you through a sample scenario of a PostgreSQL installation running on Red Hat 5.4 to demonstrate how you would use these commands.

I have downloaded a bin installer to use on my CentOS installation instead of the PostgreSQL Yum repository because I wanted to install a specific older version of Postgres outside of the package management system. In most cases you’ll want to use a repository with your package management system though, as you’ll get a more integrated installation that can be kept up to date more easily. That’s assuming that your Linux distribution offers the repository mechanism for installing and updating packages, and many distributions don’t.

After installing Postgres via the bin file, I take a look around and see that the majority of the PostgreSQL files are in the /opt/PostgreSQL directory. I decide to experiment with the binaries under the pgAdmin3 directory, and so I use the cd command to move to /opt/PostgreSQL/8.4/pgAdmin3/bin. Once I’m there, I try to run the psql command and get the output in Listing 7 (same as Listing 1).

Listing 7

$ ./psql ./psql: error while loading shared libraries: libpq.so.5: cannot open shared object file: No such file or directory

There might be some of you reading this who will realize that I could have probably avoided the library error in Listing 7 by running the psql command from the /opt/PostgreSQL/8.4/bin directory. While this is true, for the sake of this example I’m going to forge ahead trying to figure out why it won’t run under the pgAdmin3 directory.

The main thing that I take away from the output in Listing 7 is that there is a shared library named libpq.so.5 that cannot be found by ld-linux.so. To dig just a little bit deeper, I use the ldd command and get the output in Listing 8.

Listing 8

$ ldd ./psql linux-gate.so.1 => (0x003fc000) libpq.so.5 => not found libxml2.so.2 => /usr/lib/libxml2.so.2 (0x00845000) libpam.so.0 => /lib/libpam.so.0 (0x0054f000) libssl.so.4 => not found libcrypto.so.4 => not found libkrb5.so.3 => /usr/lib/libkrb5.so.3 (0x00706000) libz.so.1 => /usr/lib/libz.so.1 (0x003d5000) libreadline.so.4 => not found libtermcap.so.2 => /lib/libtermcap.so.2 (0x00325000) libcrypt.so.1 => /lib/libcrypt.so.1 (0x004a3000) libdl.so.2 => /lib/libdl.so.2 (0x0031f000) libm.so.6 => /lib/libm.so.6 (0x0033f000) libc.so.6 => /lib/libc.so.6 (0x001d7000) libaudit.so.0 => /lib/libaudit.so.0 (0x00532000) libk5crypto.so.3 => /usr/lib/libk5crypto.so.3 (0x0079e000) libcom_err.so.2 => /lib/libcom_err.so.2 (0x0052d000) libkrb5support.so.0 => /usr/lib/libkrb5support.so.0 (0x006f6000) libkeyutils.so.1 => /lib/libkeyutils.so.1 (0x005ae000) libresolv.so.2 => /lib/libresolv.so.2 (0x00518000) /lib/ld-linux.so.2 (0x001b9000) libselinux.so.1 => /lib/libselinux.so.1 (0x003bb000) libsepol.so.1 => /lib/libsepol.so.1 (0x00373000)

Notice that the error given in Listing 7 only gives you the first shared library that’s missing. As you can see in Listing 8, this doesn’t mean that other libraries won’t be missing as well.

My next step is to see if the missing libraries are already installed somewhere on my system using the find command. If the libraries are not already installed, I’ll have to use the package management system or the Internet to see which package(s) I need to install to get them. The output in Listing 9 shows the output from the find command.

Listing 9

$ sudo find / -name libpq.so.5 /opt/PostgreSQL/8.4/lib/libpq.so.5 /opt/PostgreSQL/8.4/pgAdmin3/lib/libpq.so.5

After looking in both of the directories shown in the output, I notice that all of my other missing libraries are housed within them. If you were just temporarily testing some new features of the psql command, you could use the export command to set the LD_LIBRARY_PATH environment variable as I have in Listing 10.

Listing 10

$ export LD_LIBRARY_PATH="/opt/PostgreSQL/8.4/lib/" bash-3.2$ ./psql Password: psql (8.4.3) Type "help" for help. postgres=#

You can see that once I’ve set the LD_LIBRARY_PATH variable, all I have to do is enter my PostgreSQL password and I’m greeted with the psql command line interface. I’ve used the /opt/PostgreSQL/8.4/lib/ library directory instead of the one beneath the pgAdmin3 directory as a matter of preference. In this case both directories include the same required libraries. For a permanent solution, we can add the path via the ld.so.conf file.

I could just add /opt/PostgreSQL/8.4/lib/ directly to the ld.so.conf file on its own line, but since the ld.so.conf file on my installation has the include ld.so.conf.d/*.conf directive, I’m going to add a separate conf file instead. In Listing 11 you can see that I’ve echoed the PostgreSQL library path into a file called postgres-i386.conf under the /etc/ld.so.conf.d directory. After checking to make sure the file has the directory in it, I run the ldconfig command to update the library cache.

Listing 11

$ sudo -s Password: [root@localhost bin]# echo /opt/PostgreSQL/8.4/lib > /etc/ld.so.conf.d/postgres-i386.conf [root@localhost bin]# cat /etc/ld.so.conf.d/postgres-i386.conf /opt/PostgreSQL/8.4/lib [root@localhost bin]# /sbin/ldconfig [root@localhost bin]# exit exit

Make sure that you unset the LD_LIBRARY_PATH variable though so that you can make sure that it was your ld.so.conf configuration file changes that fixed the problem, and not the environment variable. Issuing a command line such as unset LD_LIBRARY_PATH will accomplish this for you.

There are many scenarios beyond the one in this example, but it gives you the concepts used to work through the majority of shared library problems that you’re likely to come up against as a system administrator. If you’re interested in delving more deeply though, there are several links in the Resources section that should help you.

Tips and Tricks

  • I have read that running ldd on an untrusted program can open your system up to a malicious attack. This happens when an executable’s embedded ELF information is crafted in such a way that it will run itself by specifying its own loader. The man pages on the Ubuntu and Red Hat systems that I checked don’t mention anything about this security concern, but you’ll find a very good article by Peteris Krumins in the Resources section of this post. I would suggest at least skimming Peteris’ post so that you’re aware of the security implications of running ldd on unverified code.
  • Although it’s a little bit beyond the scope of this post, you can compile a program from source and manually control which libraries it links to. This is yet another way to work around library compatibility issues. You use the GNU C Compiler/GNU Compiler Collection (gcc) along with its -L, and -l options to accomplish this. Have a look at item 13 (the YoLinux tutorial) in the Resources section for an example, and the gcc man page for details on the options.
  • Have a look at the readelf and nm commands if you want a more in-depth look at the internals of the binaries and libraries that you’re working with. readelf shows you some extra information on your ELF files by reading and parsing their internal information, and nm lists the symbols (functions, etc) within an object file.
  • You can temporarily preempt your current set of libraries and their functions with the LD_PRELOAD environment variable and/or the /etc/ld.so.preload file. Once these are set, the dynamic library loader will use the preload libraries/functions in preference to the ones that you have cached using ldconfig. This can help you work around shared library problems in a few instances.
  • If you run into a program that has its required library path(s) hard coded into it, you can create symbolic links from each one of the missing libraries to the location that’s expected by the executable. This technique can also help you work around incompatibilities in the naming conventions between what your system software expects, and what libraries are actually named. I talk about using symbolic links in this way a little more in the Troubleshooting section.

Scripting

These scripts are somewhat simplified and in most cases could be done other ways too, but they will work to illustrate the concepts. If you use these scripts, make sure you adapt them to your situation. Never run a script or command without understanding what it will do to your system.

The first script shown in Listing 12 can be used to search directory trees for binaries with missing libraries. It makes use of the ldd and find commands to do the bulk of the work, looping through their output. Since I have heavily commented the scripts in Listing 12 and Listing 13, I won’t explain the details of how they work in this text.

Listing 12

#!/bin/bash - # These variables are designed to be changed if your Linux distro's ldd output # varies from Red Hat or Ubuntu for some reason iself="not a dynamic executable" # Used to see if executable is not dynamic notfound="not.*found" # Used to see if ldd doesn't find a library # Step through all of the executable files in the user specified directory for exe in $(find $1 -type f -perm /111) do # Check to see if ldd can get any information from this executable. It won't # if the executable is something like a script or a non-ELF executable. if [ -z "$(ldd $exe | grep -i "$iself")" ] then # Step through each of the lines of output from the ldd command # substituting : for a delimiter instead of a space for line in $(ldd $exe | tr " " ":") do # If ldd gives us output with our "not found" variable string in it, # we'll need to warn the user that there is a shared library issue if [ -n "$(echo "$line" | grep -i "$notfound")" ] then # Grab the first field, ignoring the words "not" or "found". # If we don't do this, we'll end up grabbing a field with a # word and not the library name. library="$(echo $line | cut -d ":" -f 1)" printf "Executable %s is missing shared object %sn" $exe $library fi done fi done

When run on the /opt/PostgreSQL directory mentioned above, it finds all of the programs that exhibit our missing library problem. As it stands now, this script will only check the first layer of library dependencies. One way to improve it would be to make the script follow the dependency chain of every library to the end, making sure that there is not a library farther down the chain that is missing. Better yet, you could add a “max-depth” option so that the user could specify how deeply into the dependency chain they wanted the script to check before moving on. A max-depth setting of “0″ would allow the user to specify that they wanted the script to follow the dependency chain to the very end.

In Listing 13, I have created a wrapper script that could be used when developing new software, or as a last ditch effort to work around a really tough shared library problem. It utilizes the shell’s feature of temporarily setting an environment variable for a command on the same line as the command designation. That way we’re not setting LD_LIBRARY_PATH for the overall environment, which could cause problems for other programs if there are library naming conflicts.

Listing 13

#!/bin/bash - # Set up the variables to hold the PostgreSQL lib and bin paths. These paths may # vary on your system, so change them accordingly. LIB_PATH=/opt/PostgreSQL/8.4/lib # Postgres library path BIN_FILE=/opt/PostgreSQL/8.4/pgAdmin3/bin/psql # The binary to run # Start the specified program with the library path and have it replace this # process. Note that this will not change LD_LIBRARY_PATH in the parent shell. exec $(LD_LIBRARY_PATH="$LIB_PATH" "$BIN_FILE")

I’ve broken the library and binary paths out into variables to make it easier for you to adapt this script for use on your system. This script could easily serve as a template for other wrapper scripts as well, anytime that you need to alter the environment before launching a program. Remember though that this wrapper script should not be used for a permanent solution to your shared library problems unless you have no other choice.

Troubleshooting

In some cases, a program may have been hard coded to look for a specific library on your system in a certain path, thus ignoring your local library settings. In order to fix this problem, you can research what version/path of the library the program is looking for and then create a symbolic link between the expected library location and a compatible library. In some cases you can recompile the program with options set to change how/where it looks for libraries. If the programmer was really kind, they may have included a command line option to set the library location, but this would be the exception rather than the rule when library locations are hard coded.

The ldd command will not work with older style a.out binaries, and will probably give output mentioning “DLL jump” if it encounters one. It’s a good idea not to trust what ldd tells you when you’re running it on these types of binaries because the output is unpredictable and inaccurate. Newer ELF binaries have support for ldd built into them via the compiler, which is why they work.

Just because the dynamic linker finds a library doesn’t mean that the library isn’t missing “symbols” (things like functions/subroutines). If this happens, you may be able to match the ldd command output to libraries that are installed, but your program will still have unpredictable behavior (like not starting or crashing) when it tries to access the symbol(s) that are missing. In this case the ldd command’s -d and -r options can give you more information on the missing symbols, and you’ll need to dig deeper into the software developer’s documentation to see if there are compatibility issues with the specific version of the library that you’re running. Remember that you can always use the LD_LIBRARY_PATH variable to temporarily test different versions of the library to see if they fix your problem.

There may be some rare cases where ldconfig may not be able to determine a library type (libc4, 5, or 6) from it’s embedded information. If this happens, you can specify the type manually in the /etc/ld.so.conf file with a directive like dirname=TYPE where type can be libc4, libc5, or libc6. According to the man page for ldconfig, you can also specify this information directly on the command line to keep the change on a temporary basis.

If you have stubborn library problems that you just can’t seem to get a handle on, you might try setting the LD_DEBUG environment variable. Try typing export LD_DEBUG="help" first and then run a command (like ls) so that you can see what options are available. I normally use “all“, but you can be more selective on your choices. The next time that you run a program, you’ll see output that is like a stack trace for the library loading process. You can follow this output through to see where exactly your library problem is occurring. Issue unset LD_DEBUG to disable this debugging output again.

Conclusion

I hope that this post has armed you with the knowledge that you need to solve any shared library problems that you might come up against. Work through shared library problems step-by-step by determining what library/libraries are needed, finding out if they’re already installed, installing any missing libraries, and making sure that your Linux distribution can find the libraries, and you should have no problem fixing most of your dynamic library issues. If you have any questions, or have any information that should be added to this post, leave a comment or drop me an email. I welcome your feedback.

Resources

  1. IBM developerWorks Article On Shared Libraries By Peter Seebach
  2. Linux Foundation Reference On Statically And Dynamically Linked Libraries (Developer Oriented)
  3. LPIC-1 : Linux Professional Institute Certification Study Guide By Roderick W. Smith
  4. LPIC-1 In Depth By Michael Jang
  5. How-To On Shared Libraries From Linux Online
  6. Stack Overflow Post Giving An Example Of A Shared Library Problem
  7. OpenGuru Post On Shared Library Problem Caused By Not Having /usr/local/lib in /etc/ld.so.conf
  8. An Introduction To ELF Binaries By Eric Youngdale (Linux Journal)
  9. Short Explanation Of How To Tell a.out and ELF Binaries Apart
  10. Post On ldd Arbitrary Code Execution Security Issues By Peteris Krumins
  11. The Text Version Of The Filesystem Hierarchy System (Version 2.3)
  12. A Linux Documentation Project gcc Reference Covering Shared Libraries
  13. YoLinux Library Tutorial Including A gcc Linking Example
  14. Article By Johan Petersson Explaining What linux-gate.so.1 Is And Why You’ll Never Find It

Device Or Resource Busy Errors In Linux

Video

Part 1

Part 2

Audio

Download

Quick Start

If you just want enough information to fix your problem quickly, you can read the How-To section of this post and skip the rest. I would highly recommend reading everything though, as a good understanding of the concepts and commands outlined here will serve you well in the future. We also have Video and Audio included with this post that may be a good quick reference for you. Don’t forget that the man and info pages of your Linux/Unix system can be an invaluable resource as well when you’re trying to solve problems.

Preface

To make things easier on you, all of the black command line and script areas are set up so that you can copy the text from them. This does make using the commands easier, but if you’re not already familiar with the concepts presented here, typing the commands yourself and working through why you’re typing them will help you learn more. If you hit problems along the way, take a look at the Troubleshooting section near the end of this post for help.

There are formatting conventions that are used throughout this post that you should be aware of. The following is a list outlining the color and font formats used.

Command Name or Directory Path
Warning or Error
Command Line Snippet With Commands/Options/Arguments
Command Options and Their Arguments Only
Hyperlink

Overview

When you try to access an object on a Linux file system that is in use, you may get an error telling you that the device or resource you want is busy. When this happens, you may see a message like the one in Listing 1.

Listing 1

$ sudo umount /media/4278-62C2/ umount: /media/4278-62C2: device is busy. (In some cases useful info about processes that use the device is found by lsof(8) or fuser(1))

Notice that there are 2 commands specified at the end of the output – lsof and fuser, which are the two commands that this post will be focused on.

Introducing lsof

lsof is used to LiSt Open Files, hence the command’s name. It’s a handy tool normally used to list the open files on a system along with the associated processes or users, and can also be used to gather information on your system’s network connections. When run without options, lsof lists all open files along with all of the active processes that have them open. To get a full and accurate view of what files are open by what processes, make sure that you run the lsof command with root privileges.

To use lsof on a specific file, you have to specify the full path to the file. Remember that everything in Linux is a file, so you can use lsof on anything from directories to devices. This makes lsof a very powerful tool once you’ve learned it.

There are many options for lsof, and I have listed summaries for the ones that I find most useful in Listing 2. Anything with square brackets around it (“[" and "]“) is an argument to the option, and a pipe (“|“) means that you can choose one of two alternatives ([4|6] means choose 4 or 6).

Listing 2

+d [directory] Scans the specified directory and all directories/files in its top level to see if any are open. +D [directory] Scans the specified directory and all directories/files in it recursively to see if any are open. -F [characters] Allows you to specify a list of characters used to split the output up into fields to make it easier to process. Type lsof -F ? for a list of characters. -i [address] Shows the current user's network connections and the processes associated with them. Connection types can be specified via an argument: [4|6][protocol][@hostname|hostaddr][:service|port] -N Enables the scanning/listing of files on NFS mounts. -r [seconds] Causes lsof to repeat it's scan indefinitely or every so many seconds. +r [seconds] A variation of the -r option that will exit on the first iteration when no open files are listed. It uses seconds as a delay value. -t Strips all data out of the output except the PIDs. This is good for scripts and piping data around. -u [user|UID] Allows you to show the open files for the user or user ID that you specify. -w Causes warning messages to be suppressed. Make sure that the warnings are harmless before you suppress them.

If you are extra security conscious, have a look at the SECURITY section of the lsof man page. There are 3 main issues that the developers of lsof feel may be security caveats. Many distributions have addressed at least some of these security concerns already, but it doesn’t hurt to understand them yourself.

Introducing fuser

By default fuser just gives you the PIDs of processes that have a file open on a system. The PIDs are accompanied by a single character that represents the type of access that the process is performing on that file (f=open file, m=memory mapped file or shared library, c=current directory, etc). If you want output that’s somewhat similar to the lsof command, you can add the -v option for verbose output. According to the man page, this formats the output in a “ps-like” style. To get a full and accurate view of what files are open by all processes, make sure that you run fuser with root privileges. Listing 3 holds some of the fuser options that I find most useful.

Listing 3

-i Used with the -k option, it prompts the user before killing each process. -k Attempts to kill all processes that are accessing the specified file. -m Shows the users and processes accessing any file within a mounted file system. -s Silent mode where no output is shown. This is useful if you only want to check the exit code of fuser in a script to see if it was successful. -u Appends the user name associated with each process to each PID in the output. -v Gives a "ps-like" output format that is somewhat similar to the default lsof output.

fuser is supposed to be a little lighter weight than lsof when it comes to using your system resources. To get an idea of what “a little” meant, I ran some very quick tests on both of the commands. I found that fuser consistently took only 30% – 50% of the time that it took lsof to run the same scan, but used about the same amount of RAM (within 5%). My tests were quick and dirty using the ps and time commands, so your mileage may vary. In any event very few users, if any, will notice a performance difference between the two commands because they use such a small amount of system resources.

How-To

Hopefully by the point you’re reading this section you either have, or are beginning to get a pretty good understanding of both the lsof and fuser commands. Either one of them can be used to solve device and/or resource busy errors in Linux. Let’s take a look at a few scenarios.

Say that I have mounted a CD to /media/cdrom0, used it for awhile copying files from it, and now want to unmount it. The problem is that Linux won’t let me unmount the CD. I get the familiar error in Listing 4, but you can see that I then use lsof and fuser to track down what’s going on.

Listing 4

$ sudo umount /media/cdrom0 umount: /media/cdrom0: device is busy. (In some cases useful info about processes that use the device is found by lsof(8) or fuser(1)) $ sudo lsof -w /media/cdrom0 COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME bash 2238 jwright cwd DIR 11,0 4096 1600 /media/cdrom0/boot $ sudo fuser -mu /media/cdrom0 /media/cdrom0: 2238c(jwright)

Both commands tell me what PID is accessing the file system mounted on /media/cdrom0 (2238). Each of the two commands also tells me that the process is using a directory within the /media/cdrom0 file system as it’s current working directory. This is shown as the cwd specifier in the lsof output, and the letter c in the output of fuser (appended to the PID). Finally, each of the commands tells me that a process I (jwright) started is using the directory, and lsof goes one step further in telling me the exact directory the process (listed as bash in the COMMAND column) is using as its current working directory.

Armed with this information, I start searching around and find that I have a virtual terminal open in which I used the cd command to descend into the /media/cdrom0/boot directory. I have to change to a directory outside of the mounted file system or exit that virtual terminal for the umount command to succeed. This example uses a simple oversight on my part to illustrate the point, but many times the process holding the file open is going to be outside of your direct control. In that case you have to decide whether or not to contact the user who owns the process and/or kill the process to release the file. Be careful when killing processes without contacting your users though, as it can cause the user who is accessing the file/directory some major problems.

Another scenario is something that has happened to me when running Arch Linux. At seemingly random intervals, MPlayer (run from the command line) would refuse to output sound and started complaining that the resource /dev/dsp was busy and that it couldn’t open /dev/snd/pcmC0D0p. Listing 5 shows an excerpt from the error MPlayer was giving me, and Listing 6 is the output that I got from running the lsof command on /dev/snd/pcmC0D0p.

Listing 5

[AO OSS] audio_setup: Can't open audio device /dev/dsp: Device or resource busy [AO_ALSA] alsa-lib: pcm_hw.c:1325:(snd_pcm_hw_open) open /dev/snd/pcmC0D0p failed: Device or resource busy

Listing 6

$ lsof /dev/snd/pcmC0D0p COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME firefox 4398 jwright mem CHR 116,6 5818 /dev/snd/pcmC0D0p firefox 4398 jwright 84u CHR 116,6 0t0 5818 /dev/snd/pcmC0D0p exe 4534 jwright mem CHR 116,6 5818 /dev/snd/pcmC0D0p exe 4534 jwright 37u CHR 116,6 0t0 5818 /dev/snd/pcmC0D0p

After doing some research, I found that the exe process was associated with the version of the Google Chrome browser that I was running and with it’s use of Flash player. I closed Firefox and Chrome and then tested MPlayer again, but still didn’t have any sound. I then ran the same lsof command again and noticed that the exe process was still there, apparently hung. I killed the exe process and was then able to get sound out of MPlayer immediately.

Through this investigation I found that the problem was not truly random, but occurred whenever Chrome came in contact with a Flash movie with sound. The silent MPlayer problem only seemed random because I was not accessing Flash movies with sound at consistent intervals. Now I’m not meaning to pick on Arch Linux here, because the problem seems to have been present in other distributions as well. Also, I have been unable to reproduce this problem on newer versions of Google Chrome running on Arch Linux, telling me that the issue has probably been resolved.

Listing 7 shows a basic example of how you might use the lsof command to track what services/processes are using the libwrap (TCP Wrappers) library. Keep in mind that the | head -4 text at the end of the command line just selects the first 4 lines of output.

Listing 7

$ lsof /lib/libwrap.so.0 | head -4 COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME pulseaudi 1690 jwright mem REG 8,1 30960 668 /lib/libwrap.so.0.7.6 gconf-hel 1693 jwright mem REG 8,1 30960 668 /lib/libwrap.so.0.7.6 gnome-set 1703 jwright mem REG 8,1 30960 668 /lib/libwrap.so.0.7.6

If you wanted to get a full system-wide view of the processes using libwrap, you would run the command with sudo or by issuing the su command (I recommend using sudo instead thought).

Carrying this example further, we could add the -i option to display the network connection information as well (Listing 8). The TCP argument to the option tells lsof that we want to only look at TCP connections, excluding other connections like UDP. This is a good way study the services that are currently being protected by the TCP Wrappers mechanism. Please note that this command may take some time to complete.

Listing 8

$ lsof -i TCP /lib/libwrap.so.0 | head -10 COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME pulseaudi 1675 jwright mem REG 8,1 30960 668 /lib/libwrap.so.0.7.6 gconf-hel 1678 jwright mem REG 8,1 30960 668 /lib/libwrap.so.0.7.6 gnome-set 1690 jwright mem REG 8,1 30960 668 /lib/libwrap.so.0.7.6 metacity 1719 jwright mem REG 8,1 30960 668 /lib/libwrap.so.0.7.6 gnome-vol 1770 jwright mem REG 8,1 30960 668 /lib/libwrap.so.0.7.6 firefox 1909 jwright mem REG 8,1 30960 668 /lib/libwrap.so.0.7.6 chrome 1992 jwright 59u IPv4 101427 0t0 TCP topbuntu.local:42427->iy-in-f83.1e100.net:https (ESTABLISHED) chrome 1992 jwright 61u IPv4 124360 0t0 TCP topbuntu.local:40761->208.69.36.231:https (CLOSE_WAIT) chrome 1992 jwright 68u IPv4 12636 0t0 TCP topbuntu.local:35689->iy-in-f18.1e100.net:https (ESTABLISHED)

By using the -t option, you receive output from lsof that can then be passed to another command like kill. Listing 9 shows that I have opened a file with two instances of tail -f so that tail will keep the file open and update me on any data that is appended to it. Listing 10 shows a quick way to terminate both of the tail processes in one shot using the -t option and back-ticks.

Listing 9

$ lsof /tmp/testfile.txt COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME tail 10784 jwright 3r REG 8,1 0 16282 /tmp/testfile.txt tail 10792 jwright 3r REG 8,1 0 16282 /tmp/testfile.txt

Listing 10

$ kill `lsof -t /tmp/testfile.txt`

If you haven’t seen back-ticks (`) used before in the shell, it probably looks a little strange to you. The back-ticks in this instance tell the shell to execute the command between them, and then replace the back-ticked command with the output. So, for Listing 10 the section of the line within the back-ticks would be replaced by the list of PIDs that are accessing /tmp/testfile.txt. These PIDs are passed to the kill command which sends SIGTERM to each instance of tail, causing them to exit.

An alternative to this would be what you see in Listing 11, where you see the -k and -i options of the fuser command used to interactively kill both instances to tail.

Listing 11

$ fuser -ki /tmp/testfile.txt /tmp/testfile.txt: 11106 11107 Kill process 11106 ? (y/N) y Kill process 11107 ? (y/N) y

Tips and Tricks

  • Don’t use the -k option with fuser without checking to see which processes it will kill first. The easiest way to do this is by using the -ki option combination so that fuser will prompt you before killing the processes (see Listing 11). You can specify a signal other than SIGKILL to send to a process with the -SIGNAL argument to the -k option.
  • As mentioned above, the -r option of lsof causes it to repeat its scan every so many seconds, or indefinitely. This can be very useful when you are writing a script that may need to call lsof repeatedly because it avoids the wasted overhead of starting the command from scratch each time.
  • lsof functionality is supposed to be fairly standard across the Linux and Unix landscape, so using lsof in your scripts can be an advantage when you’re shooting for portability.
  • When you are using fuser to check who or what is using a mounted file system, add the -m option to your command line. By doing this, you tell fuser to list the users/processes that have files open in the entire file system, not just the directory you specify. This will prevent you from being confused when fuser doesn’t give you any information even though you know the mounted file system is in use. So, you would issue a command that’s something like

    sudo fuser -mu /media/cdrom

    to save you that trouble. You still don’t know which subdirectory or file is being held open, but this is easily solved by using the +D option with lsof to search the mounted file system recursively.

    sudo lsof +D /media/cdrom/

Scripting

These scripts are somewhat simplified and in most cases could be done other ways too, but they will work to illustrate the concepts. If you use these scripts, make sure you adapt them to your situation. Never run a script or command without understanding what it will do to your system.

For the first scripting example, lets say that it’s 5:00 and you need to leave for the day, but you also have to delete a shared configuration file that’s still being used by several people. Presumably the configuration file will be automatically recreated when someone needs it next. The script shown in Listing 12 shows one way of taking care of the file deletion while still leaving on time, and it uses lsof. This assumes for the sake of the example that every system that has access to the shared configuration file releases it when users are done and logout for the night. Make sure to run this script with root privileges or it might not see everyone that’s using the file before deleting it, causing a mess.

Listing 12

#!/bin/bash - # Check every 30 seconds to see everyone is done with the file lsof +r 30 /tmp/testfile.txt > /dev/null 2>&1 # We've made it past the lsof line, so we must be ok to delete the file rm /tmp/testfile.txt

You end up with a very quick and simple script that doesn’t require a continuous while loop, or a cron job to finish its task.

Another example would be using fuser to make a decision in a script. The script could check to see if a preferred resource is in use and move on to the next one if it is. Listing 13 shows an example of such a script.

Listing 13

#!/bin/bash - # Make sure to run this script with root privileges or it # may not work. # Set up a counter to track which console we are checking COUNTER=0 # Loop until we find an unused virtual console or run out of consoles while true do # Check to see if any user/process is using the virtual console fuser -s /dev/tty$COUNTER # Check to see if we've found an unused virtual console if [ $? -ne 0 ] then echo "The first unused virtual console is" /dev/tty$COUNTER break fi # Get ready to check the next virtual console COUNTER=$((COUNTER+1)) # Try to get a listing of the virtual console we are checking ls /dev/tty$COUNTER > /dev/null 2>&1 # Check to see if we've run out of virtual consoles to check. # The ls command won't return anything if the file doesn't exist. if [ $? -ne 0 ] then echo "No unused virtual console was found." break fi done

This script loops through all of the virtual console device files (/dev/tty*) and looks for one that fuser says is unused. Notice that I’m checking the exit code of both fuser and ls via the built-in variable $?, which holds the exit status of the last command that was run.

That’s just a small sampling of what you can do with lsof and fuser within scripts. There are any number of ways to improve and expand upon the scripts that I’ve given in Listing 12 and Listing 13. Having an in-depth knowledge of the commands will open up a lot of possibilities for your scripts and even for your general use of the shell.

Troubleshooting

Every time that I try to run the lsof command on my Ubuntu 9.10 machine with administrative privileges, I get the following warning:

lsof: WARNING: can't stat() fuse.gvfs-fuse-daemon file system /home/jwright/.gvfs

This warning occurs when lsof tries to access the Gnome Virtual File System (gvfs), which is (among other things) a foundational part of Gnome’s Nautilus file manager. lsof is warning you that it doesn’t have the ability to look inside of the virtual file system and so it’s output may not contain every relevant file. This warning should be harmless, and can be suppressed with the -w option.

Listing 14

$ sudo lsof | head -3 lsof: WARNING: can't stat() fuse.gvfs-fuse-daemon file system /home/jwright/.gvfs Output information may be incomplete. COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME init 1 root cwd DIR 8,1 4096 2 / init 1 root rtd DIR 8,1 4096 2 /

becomes something like this…

Listing 15

$ sudo lsof -w | head -3 COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME init 1 root cwd DIR 8,1 4096 2 / init 1 root rtd DIR 8,1 4096 2 /

If lsof stops for a long time, you might need to use some of the “Precautionary Options” listed in the Apple Quickstart Guide in the Resources section. The lsof man page also has a group of sections which start at BLOCKS AND TIMEOUTS that may help you.

Conclusion

There’s a whole host of possibilities for the lsof and fuser commands beyond what I’ve mentioned here, but hopefully I’ve given you a good start. As with so many other things, the time you put into mastering your Linux system will pay you back again and again. If you have any information to add to what I’ve said here, feel free to drop a line in the comments section or send us an email.

Resources

  1. Apple’s Quickstart Guide For lsof
  2. A Good Practical lsof Reference By Philippe Hanrigou
  3. Undeleting Files With lsof and cp
  4. Using fuser To Deal With Device Busy Errors
  5. A Good Reference On fuser (Geared Toward Solaris) By Sandra Henry-Stocker
  6. What To Do With An lsof Gnome Virtual File System (gvfs) Error
  7. LPIC-1 : Linux Professional Institute Certification Study Guide By Roderick W. Smith