ELF Internals and Binutils: Seeing What the Compiler Produced
1. TL;DR
- You’ll learn how an ELF (Executable and Linkable Format) file is structured and how that structure maps to runtime memory.
- You’ll practice using
readelf,objdump,nm,objcopy,xxd, andhexdumpto answer practical questions:- “What is the entry point?”
- “Where is this function?”
- “Which bytes correspond to that instruction?”
- “Why does this address exist in disassembly but not in the file?”
- You’ll build a mental model of sections vs segments, symbols, and relocations.
If you can read ELF structure confidently, reverse engineering and debugging become dramatically easier. You stop guessing!
2. Prerequisites
riscv64-unknown-elf-gccreadelfobjdumpnmobjcopyxxdhexdump
3. ELF in one diagram
Think of an ELF as having two different “views” of the same data:
- Sections: developer / linker view (good for symbols, disassembly, and static analysis)
- Segments: loader view (what gets mapped into memory when the program runs)
| |
4. The “core” tools and what each is for
4.1. readelf (structure)
- Read headers, sections, segments, symbols, relocations.
- Best tool to answer “what is in this ELF?”
4.2. objdump (content)
- Disassemble code (
-d) - Dump section bytes (
-s) - Show symbol table (
-t)
4.3. nm (symbols)
- Quick “list symbols and addresses”
4.4. objcopy (transform)
- Convert ELF → raw binary (
-O binary) - Extract a section
4.5. xxd / hexdump (raw bytes)
- Verify byte-level hypotheses (endianness, offsets)
5. Hands-on: inspect a bare-metal sample ELF
5.1. Build a sample ELF
We will inspect a small bare-metal program that writes over UART:
| |
5.2. Show the ELF header
| |
Look for:
- Class:
ELF32 - Machine:
RISC-V - Entry point address: the first instruction the loader jumps to
5.3. Show sections
| |
Key fields to understand:
- Name: e.g.,
.text,.rodata,.data,.bss - Addr: virtual address (VMA (Virtual Memory Address) when loaded)
- Off: file offset (where bytes live in the file)
- Size: section size
- Flags:
AX(alloc + execute),WA(write + alloc)
When you need to map “this runtime address” → “which bytes in the file”, you use:
file_offset = section_off + (address - section_addr)
Example with build/lab_rv32.elf (from readelf -S):
.texthasAddr=0x80000000andOff=0x001000- If you want runtime address
0x80000124(the start ofuart_puthex32):
file_offset = 0x001000 + (0x80000124 - 0x80000000) = 0x001124
How do I prove it?
- Dump file bytes at that offset:
xxd -s 0x1124 -g 1 -l 16 build/lab_rv32.elf
- Compare with the disassembly at that runtime address:
riscv64-unknown-elf-objdump -d -M numeric,no-aliases build/lab_rv32.elf | rg -n '80000124'
Command syntax:
xxd -e -s 0x1124 -g 1 -l 16 build/lab_rv32.elf-e: switch to little-endian mode-s 0x1124: seek to byte offset0x1124from the start of the file-g 1: group bytes in 1-byte units-l 16: show 16 bytes
riscv64-unknown-elf-objdump -d -M numeric,no-aliases build/lab_rv32.elf-d: disassemble all executable sections-M numeric,no-aliases: show numeric registers and avoid pseudo-instruction aliases
rg -n '80000124'-n: include line numbers in the match output
5.4. Show segments (program headers)
| |
In the segment list, focus on:
LOADsegments: these are mapped into memoryVirtAddr/PhysAddr: where they appear at runtimeFileSiz/MemSiz: file bytes vs in-memory size
.bss usually has no bytes in the file (it’s “zero-initialized” in memory). That’s why MemSiz can be larger than FileSiz.6. Where is main? (symbols)
6.1. Fast: nm
| |
-nsorts by address- Symbol type letters matter:
T/t: text (code)D/d: initialized dataB/b: BSS
6.2. Richer: readelf -s
| |
You’ll see:
- symbol value (address)
- size
- binding (local/global)
- section index
7. Disassembly you can trust
7.1. Basic disassembly
| |
7.2. Prefer: numeric registers + no pseudo-instruction aliases
Pseudo-instructions can hide what the CPU actually executes.
| |
ret (which is really jalr x0, x1, 0). Seeing the “real” form helps debugging.7.3. Find a function in disassembly
| |
8. Match instructions to bytes (hexdump workflow)
This is a practical reverse engineering skill:
- Identify an instruction address in
objdump(e.g.main). - Convert that address → file offset using section info.
- Inspect raw bytes at that offset.
8.1. Step A: find .text mapping
You want .text Addr and Off.
| |
Output:
| |
8.2. Step B: compute a file offset
First, get the address of main:
| |
Output:
| |
On this ELF, main is at 0x80000280.
From the section headers, .text has:
Addr = 0x80000000Off = 0x001000
Then:
| |
Or let bc do the math:
bc is the standard Unix command-line calculator (supports arbitrary precision and different number bases). | |
Notes on bc:
ibase=16makesbcparse the input numbers as hex.obase=16prints the result in hex.- Set
obasebeforeibaseso16itself isn’t interpreted as hex (0x16). - Result is
1280in hex.
8.3. Step C: view bytes
objdump output is exactly what the CPU fetches.- Dump
.textbytes withobjdump:
| |
Output:
| |
In this objdump -s output, the left‑hand address (for example 0x800000d0) is the runtime/VMA address of those bytes when .text is loaded into memory, not a file offset. It’s the section’s load address plus the offset within the section.
- To inspect the bytes that correspond specifically to
main(at0x80000280→ file offset0x1280):
| |
Output:
| |
- What is the equivalent view in the
xxd/hexdump?
The objdump -j .text finds the .text section by name in the ELF section table and dumps the bytes that belong to it. In the other hand, xxd and hexdump are section‑agnostic; they just dump raw bytes starting at a file offset. In this ELF, .text begins at file offset 0x1000 (as seen in the section headers), so these commands are equivalent views of the same bytes:
- View using
xxd:
| |
- View using
hexdump:
| |
- To verify the offset for your file, check the
.textentry in the section headers:
| |
If the .text offset is different, replace 0x1000 with that value.
9. Relocations: “addresses not final yet”
A relocation is a note from the assembler to the linker that says: “I had to put something in this instruction or data slot, but I don’t know the final address yet. Please fix it later.”
This happens because .o files are built before the linker decides where everything lives in memory.
9.1. The basic idea (with a mental model)
When you write:
- a call to a function (
call foo) - a reference to a global (
la t0, global_var)
The assembler can’t know the final address of foo or global_var.
So it:
- Emits a placeholder in the instruction/data,
- Adds a relocation entry that describes how to patch it later.
At link time, the linker reads those entries, computes the real addresses, and rewrites the bytes.
9.2. Anatomy of a relocation entry
A relocation usually includes:
- offset: where in the section to patch
- type: how to patch it (absolute, PC‑relative, hi/lo pair, etc.)
- symbol: what the patch should point to
- addend: extra constant to add (RISC‑V typically uses RELA, which stores the addend explicitly)
9.3. See relocations in a real .o
Build an object file:
| |
Inspect relocation entries:
| |
Helpful companion views:
- Symbol table (names + addresses in .o)
| |
- Disassembly + relocations inline
| |
What to look for in readelf -r:
- Offset: the exact byte position to patch
- Info/Type: the relocation kind (architecture‑specific)
- Symbol: what it targets (
foo,global_var, etc.) - Addend: constant adjustment (if present)
9.4. What happens after linking?
- In a final, fully linked bare‑metal ELF, most relocations are resolved (the bytes are already patched).
- In a shared or dynamically linked ELF, some relocations remain for the loader to fix at runtime.
10. ELF → raw binary (and why addresses disappear)
Convert to a flat binary:
| |
Now check:
| |
Why .bin is smaller:
- It’s only loadable bytes; no symbol tables, no section headers.
.bin has no inherent addresses. You must know (or guess) its load address from a bootloader, memory map, or surrounding firmware.11. Exercises
- Use
readelf -hto find the entry point ofbuild/lab_rv32.elf. - Use
nm -nto find the address ofadd_u32and locate it inobjdump. - Pick one instruction inside
add_u32and find the exact bytes in the ELF using the section offset method. - Build
lab.oand list relocations; explain in one sentence what each relocation is trying to fix.
11.1. How to test your answers
- Can you point to a specific file offset that contains the bytes for an instruction at a specific virtual address?
- Can you explain why
.bsshas size in memory but not in the file?
12. Summary
You learned to navigate ELF structure and use binutils to connect:
flowchart LR
A[symbols] --> B[disassembly] --> C[raw bytes] --> D[runtime addresses]
12.1. Read Next
In the readelf -l output, you might have noticed Align 0x1000. Why does the hardware care about this number? And does your code really live at 0x80000000?
Check out Chapter 7: Memory, Paging, and The Hardware Illusion to uncover the secrets of Virtual Memory.
Next: RV32 ABI + C types we’ll connect C-level data layouts (sizes, alignment, structs) to the exact loads/stores you see in assembly.