This document describes the binary utilities (binutils), the set of tools used for compiling, inspecting, and transforming binaries (machine code files). Some of them you’ve probably heard of or used before. Other, more specialized ones, might be new to you. Learning these tools and what they can do will greatly improve your productivity as a programmer. They’ll let you quickly and easily figure out what is wrong and why with a large number of otherwise very difficult problems.
This guide refers to each tool with its short name (e.g., ld, as). Typically, on a UNIX/Linux/Darwin machine, these names refer to the native tools, tools for that computer. For example, gcc on a myth machine is a verion of the Gnu C Compiler for Linux on an x86 processor. Since we’re using a laptop to cross-compile for the Raspberry Pi, you want to use the binary utilities that are designed to run on your machine but compile for the Raspberry Pi. These tools are prefixed with
arm-none-eabi. So you want to use
This section discusses the core utilities that you use very often:
This is the assembler. It takes assembly code (human readable text of machine code) and turns it into a binary that a processor can execute. When you invoke
gcc and tell it to generate a binary (e.g., an object file), gcc first compiles the C code to assembly then invokes the assembler to generate machine code. The output of
as is a file format called ELF, or Executable and Linkable Format. Almost all executables you run in Linux, for example, are ELF. ELF includes not only machine code (the E part) but also symbols, names that provide information on where different variables and functions reside (see
objdump below). When you compile a library, for example, the resulting ELF file has not only the library’s code, but also a set of symbols that say where the code for each library function starts. That way, when you compile a program that invokes the library function, binutils can find where it is and link the two files together correctly (see
Just as the input to
as might not be a executable program, the output of
as might not be executable. For example, a library is a set of functions that other programs can call, but does not have any entry point (
main() function). It can also be that the compiled assembly code references functions that are in another binary file. For example, your program that calls a library, when compiled into machine code, does not have the actual instructions of those library calls. The tool that takes multiple binary files and links them together is
ld, described below.
The most common use of
as is as follows:
% as code.s -o code.o
This tells the assembler that
code.s is its input of assembly code, and to output object code to
code.o. If you do not include
-o code.o then it will default to outputting to
a.out. If you do not include
code.s it will default to taking its input from standard input. These two invocations are therefore equivalent:
% cat code.s | as % as code.s -o a.out
ld takes object files, libraries, and other binary files as input and links them together into another binary file. For example, when you have a library written in multiple source files, you typically compile each source file independently into an ELF object file, then link all of those files together into a larger ELF object file for the library. When you compile a program that uses the library, you compile your program code into one or more object files, then link those against the library, producing a final executable.
The linker operates by resolving symbols. Whenever your code creates or uses a name that is not on the stack (e.g., defines a function, defines a variable, declares a function and calls it), this creates a symbol. Some symbols, such as those your program defines, are resolved and exist. For example, when you define a variable
int a; in global scope, this creates a symbol for
a, which says at what memory address
a resides. When you call a function
f() that’s defined in a library, this creates an unresolved symbol for
f. When the linker links your program to the library, it sees that
f is an unresolved symbol, sees that
f is defined in the library, then fixes the binary code in your program so that when it calls
f it jumps to where
The basic use for
ld is as follows:
% ld code1.o code1.o -o code.bin
This takes two object files, links them together, and outputs the result as
When you invoke GCC without the
-c option, it typically runs
ld as its last step. For example,
% gcc -o test test.c
is mostly equivalent (I’m leaving out a lot of options that are added by default) to:
% gcc -S test.S test.c # Generate assembly % as -o test.o test.S # Generate object file % ld -o test test.o
The most commonly used options for
-l option tells
ld to link a library, which is basically just an object file. But libraries are typically stable code that you don’t update often and are used by many programs, so they live in separate directories for that purpose (e.g,
-L option tells
ld what directories to look for libraries in. So, for example,
% ld -o test test.o -lgcc -L/usr/lib
ld to link a file name
libgcc.so, but we will not be using shared objects in this class, so don’t worry about them) which it should search for not only in the local directory but also
/usr/lib. You can pass multiple
-L options. For example,
gcc will typically automatically pass
-L/usr/lib options when it invokes
ld, which are in addition to any other library loads or library search path entries you add.
Displays the symbol table of a binary file. It takes one or more file names as parameters and outputs their symbol tables. The output looks like this:
00010074 B __bss_end__ 00010070 B __bss_start 00010070 B __bss_start__ 00010070 T __data_start 00010074 B __end__ 00010074 B _bss_end__ 00010070 T _edata 00010074 B _end 00080000 N _stack U _start 00010070 B a 00008040 T f 00008000 T main
The first column states the address of the symbol. For a function, this is the first instruction to jump to when you call the function. For a variable, this is its location in memory. The second column states what kind of symbol it is. The types are:
T: Text. This is a symbol for executable code (e.g., a function).
D: Data. This is a symbol for a variable that has an initializer (e.g.,
int a = 5;).
B: Data. This is a symbol for a variable that doesn’t have an initializer (e.g.,
U: Undefined. This a symbol the objects needs to be linked. It’s defined elsewhere, and hopefully the linker will resolve it later.
N: Symbol used for debugging.
A: Absolute. Later linking will not change this symbol.
So in the above example, the object file defines a function
f, a function
main, and a variable
a. The other symbols (generally, symbols that start with
_) are generated by the compiler for bookeeping and linking.
Lists the size of sections (and total size) of object files. Can be invoked on multiple files at once simply by listing the desired files,
% size test.o test2.o
which produces output like this:
text data bss dec hex filename 80 24 32 136 88 test.o 72 0 0 72 48 test2.o
text you see the size of the actual machine code that makes up your program. Similarly to the symbol types listed under
nm above, the
data section is the size of storing your initialized global variables, and
bss is the size of storing your uninitialized global variables. The
hex numbers indicate the total size (sum of text, data, & bss) in decimal and hexidecimal, respectively.
The options for the size command are mostly to change the format of the output. For example, to see the section sizes in hexidecimal, use the
-x option (
size -x test.o). You can also specify the file type if it is not automatically recognized. For example if you want to know the size of a raw binary file, you could use
% size --target=binary test.bin
Prints text strings embedded in the input file.
strings is useful for searching binary files, which are not readable using a text editor. For example, if you wanted to search an object file for a particular string, you could call strings and pipe the result to grep, like this:
% strings test.o | grep "my string"
strings looks for strings of at least 4 printable characters (followed by a NUL character indicating the end of a string). To set a minimum string length other than 4, use the
-n option. For example,
% strings -n 6 test.o
looks for strings of at least 6 characters.
Removes symbols from object files. Symbols indicate where in memory variables and functions reside, see
ld above. By stripping out symbol tables and debug information,
strip decreases the size of object files.
strip modifies the input file rather than creating a new, stripped output file. Compare the file before and after you strip it using
nm (above). You will find that the symbols have been removed.
Transforms binary objects between different formats. For example, you can use objcopy to transform an ELF executable (which has symbols and all kinds of other information) into a simple binary. The principal options to
-I, which specify the output and input formats. Example formats are elf32-bigarm, ihex, and binary. Sometimes
objcopy can tell what the format is and so doesn’t need to be told explicitly (e.g., ELF). In its most basic use,
objcopy just makes a copy of the file. For example,
% objcopy main main2
creates a simple of main in main2. In contrast,
% objcopy main -O binary main.bin
takes main as input (an ELF file), transforms it into a raw binary file and outputs that raw binary as main.bin.
Displays information about object files. To use
objdump, you must specify at least one of the many options, which indicate what type of information you would like to view. There are many different options, check out
man objdump to see what it can show.
A very useful option is the
-d option, which allows you to view the assembly instructions associated with the executable part of the binary file (
-d is for dissassemble):
% objdump -d test.o
Allows you to create, modify, and extract archives. Archives are single files holding collections of other (usually binary) files, similar to a zip or tar file. Archives have a .a extension and are usually used to hold libraries. The linker (see
ld above) is often used to link to functions in these archive library files.
ar has options for you to create an archive, add or remove files from an existing archive, and extract files from an archive. To create an archive from object files, use the
crs options, followed by your object files. The
c option means it will not warn you that it needs to create the library (since that is what you are trying to do). The
r option says to insert the new files (or replace existing ones). If you specify the
s option, the archive maintains an index to all symbols defined in files in the archive to allow for quicker linking to the library functions.
% ar cr libtest.a test.o test2.o
makes the library archive file libtest.a containing test.o and test2.o. Then you can link to this library by specifying
-ltest in the
ld command (see
For more options on how to modify archives, see
man ar or