ELF Internals and Binutils: Seeing What the Compiler Produced

Author: Marcos Azevedo

Date: 2026-01-20

Last Modified: 2026-01-27

Reading Time: 10 mins

Section: Series

1. TL;DR

You’ll learn how an ELF (Executable and Linkable Format) file is structured and how that structure maps to runtime memory.
You’ll practice using readelf, objdump, nm, objcopy, xxd, and hexdump to answer practical questions:
- “What is the entry point?”
- “Where is this function?”
- “Which bytes correspond to that instruction?”
- “Why does this address exist in disassembly but not in the file?”
You’ll build a mental model of sections vs segments, symbols, and relocations.

If you can read ELF structure confidently, reverse engineering and debugging become dramatically easier. You stop guessing!

2. Prerequisites

riscv64-unknown-elf-gcc
readelf
objdump
nm
objcopy
xxd
hexdump

3. ELF in one diagram

Think of an ELF as having two different “views” of the same data:

Sections: developer / linker view (good for symbols, disassembly, and static analysis)
Segments: loader view (what gets mapped into memory when the program runs)

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
ELF file
 ├─ ELF header
 ├─ Program header table  (segments: loader view)
 │    ├─ PT_LOAD (text)
 │    ├─ PT_LOAD (data)
 │    └─ ...
 ├─ Section header table  (sections: linker view)
 │    ├─ .text
 │    ├─ .rodata
 │    ├─ .data
 │    ├─ .bss
 │    ├─ .symtab / .strtab
 │    └─ ...
 └─ Raw section contents

Note

Not all ELFs include a section header table (e.g., some stripped or embedded images). Program headers are what loaders actually need.

4. The “core” tools and what each is for

4.1. readelf (structure)

Read headers, sections, segments, symbols, relocations.
Best tool to answer “what is in this ELF?”

4.2. objdump (content)

Disassemble code (-d)
Dump section bytes (-s)
Show symbol table (-t)

4.3. nm (symbols)

Quick “list symbols and addresses”

4.4. objcopy (transform)

Convert ELF → raw binary (-O binary)
Extract a section

4.5. xxd / hexdump (raw bytes)

Verify byte-level hypotheses (endianness, offsets)

5. Hands-on: inspect a bare-metal sample ELF

5.1. Build a sample ELF

We will inspect a small bare-metal program that writes over UART:

1
2
3
riscv64-unknown-elf-gcc -O0 -g -ffreestanding -nostdlib \
  -march=rv32im -mabi=ilp32 -T src/link.ld \
  src/start.s src/uart.c src/lab.c -o build/lab_rv32.elf

5.2. Show the ELF header

1
readelf -h build/lab_rv32.elf

Look for:

Class: ELF32
Machine: RISC-V
Entry point address: the first instruction the loader jumps to

5.3. Show sections

1
readelf -S build/lab_rv32.elf

Key fields to understand:

Name: e.g., .text, .rodata, .data, .bss
Addr: virtual address (VMA (Virtual Memory Address) when loaded)
Off: file offset (where bytes live in the file)
Size: section size
Flags: AX (alloc + execute), WA (write + alloc)

Tip

When you need to map “this runtime address” → “which bytes in the file”, you use:

file_offset = section_off + (address - section_addr)

Example with build/lab_rv32.elf (from readelf -S):

.text has Addr=0x80000000 and Off=0x001000
If you want runtime address 0x80000124 (the start of uart_puthex32):

file_offset = 0x001000 + (0x80000124 - 0x80000000) = 0x001124

How do I prove it?

Dump file bytes at that offset:

xxd -s 0x1124 -g 1 -l 16 build/lab_rv32.elf

Compare with the disassembly at that runtime address:

riscv64-unknown-elf-objdump -d -M numeric,no-aliases build/lab_rv32.elf | rg -n '80000124'

Command syntax:

xxd -e -s 0x1124 -g 1 -l 16 build/lab_rv32.elf
- -e: switch to little-endian mode
- -s 0x1124: seek to byte offset 0x1124 from the start of the file
- -g 1: group bytes in 1-byte units
- -l 16: show 16 bytes
riscv64-unknown-elf-objdump -d -M numeric,no-aliases build/lab_rv32.elf
- -d: disassemble all executable sections
- -M numeric,no-aliases: show numeric registers and avoid pseudo-instruction aliases
rg -n '80000124'
- -n: include line numbers in the match output

5.4. Show segments (program headers)

1
readelf -l build/lab_rv32.elf

In the segment list, focus on:

LOAD segments: these are mapped into memory
VirtAddr / PhysAddr: where they appear at runtime
FileSiz / MemSiz: file bytes vs in-memory size

Important

.bss usually has no bytes in the file (it’s “zero-initialized” in memory). That’s why MemSiz can be larger than FileSiz.

6. Where is main? (symbols)

6.1. Fast: nm

1
nm -n build/lab_rv32.elf | grep -E ' main$| add_u32$'

-n sorts by address
Symbol type letters matter:
- T/t: text (code)
- D/d: initialized data
- B/b: BSS

6.2. Richer: readelf -s

1
readelf -s build/lab_rv32.elf | grep -E ' main$| add_u32$| mmio_fake$'

You’ll see:

symbol value (address)
size
binding (local/global)
section index

7. Disassembly you can trust

7.1. Basic disassembly

1
riscv64-unknown-elf-objdump -d build/lab_rv32.elf | less

7.2. Prefer: numeric registers + no pseudo-instruction aliases

Pseudo-instructions can hide what the CPU actually executes.

1
riscv64-unknown-elf-objdump -d -M numeric,no-aliases build/lab_rv32.elf | less

Note

RISC-V assembly is often shown with pseudo-instructions like ret (which is really jalr x0, x1, 0). Seeing the “real” form helps debugging.

7.3. Find a function in disassembly

1
grep -n "<add_u32>" -n <(riscv64-unknown-elf-objdump -d -M numeric,no-aliases build/lab_rv32.elf)

8. Match instructions to bytes (hexdump workflow)

This is a practical reverse engineering skill:

Identify an instruction address in objdump (e.g. main).
Convert that address → file offset using section info.
Inspect raw bytes at that offset.

8.1. Step A: find .text mapping

You want .text Addr and Off.

1
readelf -S build/lab_rv32.elf | rg -n '\.text'

Output:

1
6:  [ 1] .text   PROGBITS   80000000 001000 000344 00  AX  0   0  4

8.2. Step B: compute a file offset

First, get the address of main:

1
nm -n build/lab_rv32.elf | rg ' main$'

Output:

1
80000280 T main

On this ELF, main is at 0x80000280.
From the section headers, .text has:

Addr = 0x80000000
Off = 0x001000

Then:

1
2
offset = 0x001000 + (0x80000280 - 0x80000000)
       = 0x001280

Or let bc do the math:

Tip

The bc is the standard Unix command-line calculator (supports arbitrary precision and different number bases).

1
2
bc -q <<< 'obase=16; ibase=16; 001000 + (80000280 - 80000000)'
1280

Notes on bc:

ibase=16 makes bc parse the input numbers as hex.
obase=16 prints the result in hex.
Set obase before ibase so 16 itself isn’t interpreted as hex (0x16).
Result is 1280 in hex.

8.3. Step C: view bytes

Important

RISC-V instructions are little-endian in memory. The “byte order” you see in a objdump output is exactly what the CPU fetches.

Dump .text bytes with objdump:

1
riscv64-unknown-elf-objdump -s -j .text build/lab_rv32.elf | head -n 20

Output:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
build/lab_rv32.elf:     file format elf32-littleriscv

Contents of section .text:
 80000000 17010001 13010100 93810100 97020000  ................
 80000010 93820235 13834180 63886200 23a00200  ...5..A.c.b.#...
 80000020 93824200 e3cc62fe ef008025 73005010  ..B...b....%s.P.
 80000030 6ff0dfff 130101fd 23268102 13040103  o.......#&......
 80000040 232ea4fc 8327c4fd 93f7f700 2326f4fe  #....'......#&..
 80000050 0327c4fe 93079000 63ece700 8327c4fe  .'......c....'..
 80000060 93f7f70f 93870703 93f7f70f 6f004001  ............o.@.
 80000070 8327c4fe 93f7f70f 93877705 93f7f70f  .'........w.....
 80000080 a305f4fe b7070010 0347b4fe 2380e700  .........G..#...
 80000090 13000000 0324c102 13010103 67800000  .....$......g...
 800000a0 130101fe 232e8100 13040102 93070500  ....#...........
 800000b0 a307f4fe b7070010 0347f4fe 2380e700  .........G..#...
 800000c0 13000000 0324c101 13010102 67800000  .....$......g...
 800000d0 130101fe 232e1100 232c8100 13040102  ....#...#,......
 800000e0 2326a4fe 6f00c001 8327c4fe 13871700  #&..o....'......
 800000f0 2326e4fe 83c70700 13850700 eff05ffa  #&............_.

In this objdump -s output, the left‑hand address (for example 0x800000d0) is the runtime/VMA address of those bytes when .text is loaded into memory, not a file offset. It’s the section’s load address plus the offset within the section.

To inspect the bytes that correspond specifically to main (at 0x80000280 → file offset 0x1280):

1
xxd -s 0x1280 -g 1 -l 16 build/lab_rv32.elf

Output:

1
00001280: 13 01 01 fe 23 2e 11 00 23 2c 81 00 13 04 01 02  ....#...#,......

What is the equivalent view in the xxd/hexdump?

The objdump -j .text finds the .text section by name in the ELF section table and dumps the bytes that belong to it. In the other hand, xxd and hexdump are section‑agnostic; they just dump raw bytes starting at a file offset. In this ELF, .text begins at file offset 0x1000 (as seen in the section headers), so these commands are equivalent views of the same bytes:

View using xxd:

1
xxd -s 0x1000 -g 1 build/lab_rv32.elf | head -n 20

View using hexdump:

1
hexdump -C -s 0x1000 build/lab_rv32.elf | head -n 20

To verify the offset for your file, check the .text entry in the section headers:

1
readelf -S build/lab_rv32.elf | rg '\.text'

If the .text offset is different, replace 0x1000 with that value.

9. Relocations: “addresses not final yet”

A relocation is a note from the assembler to the linker that says: “I had to put something in this instruction or data slot, but I don’t know the final address yet. Please fix it later.”

This happens because .o files are built before the linker decides where everything lives in memory.

9.1. The basic idea (with a mental model)

When you write:

a call to a function (call foo)
a reference to a global (la t0, global_var)

The assembler can’t know the final address of foo or global_var.
So it:

Emits a placeholder in the instruction/data,
Adds a relocation entry that describes how to patch it later.

At link time, the linker reads those entries, computes the real addresses, and rewrites the bytes.

9.2. Anatomy of a relocation entry

A relocation usually includes:

offset: where in the section to patch
type: how to patch it (absolute, PC‑relative, hi/lo pair, etc.)
symbol: what the patch should point to
addend: extra constant to add (RISC‑V typically uses RELA, which stores the addend explicitly)

Note

Think of the relocation as a tiny recipe: “take the address of this symbol, apply this rule, and write it here.”

9.3. See relocations in a real `.o`

Build an object file:

1
2
riscv64-unknown-elf-gcc -O0 -g -ffreestanding -nostdlib -march=rv32im -mabi=ilp32 \
  -c src/lab.c -o build/lab.o

Inspect relocation entries:

1
readelf -r build/lab.o

Helpful companion views:

Symbol table (names + addresses in .o)

1
readelf -s build/lab.o

Disassembly + relocations inline

1
riscv64-unknown-elf-objdump -dr build/lab.o

What to look for in readelf -r:

Offset: the exact byte position to patch
Info/Type: the relocation kind (architecture‑specific)
Symbol: what it targets (foo, global_var, etc.)
Addend: constant adjustment (if present)

9.4. What happens after linking?

In a final, fully linked bare‑metal ELF, most relocations are resolved (the bytes are already patched).
In a shared or dynamically linked ELF, some relocations remain for the loader to fix at runtime.

Tip

If you ever wonder “how does the linker connect this call to that function?”, relocations are the answer.

10. ELF → raw binary (and why addresses disappear)

Convert to a flat binary:

1
riscv64-unknown-elf-objcopy -O binary build/lab_rv32.elf lab_rv32.bin

Now check:

1
ls -l build/lab_rv32.elf lab_rv32.bin

Why .bin is smaller:

It’s only loadable bytes; no symbol tables, no section headers.

Warning

A raw .bin has no inherent addresses. You must know (or guess) its load address from a bootloader, memory map, or surrounding firmware.

11. Exercises

Use readelf -h to find the entry point of build/lab_rv32.elf.
Use nm -n to find the address of add_u32 and locate it in objdump.
Pick one instruction inside add_u32 and find the exact bytes in the ELF using the section offset method.
Build lab.o and list relocations; explain in one sentence what each relocation is trying to fix.

11.1. How to test your answers

Can you point to a specific file offset that contains the bytes for an instruction at a specific virtual address?
Can you explain why .bss has size in memory but not in the file?

12. Summary

You learned to navigate ELF structure and use binutils to connect:

flowchart LR
  A[symbols] --> B[disassembly] --> C[raw bytes] --> D[runtime addresses]

12.1. Read Next

In the readelf -l output, you might have noticed Align 0x1000. Why does the hardware care about this number? And does your code really live at 0x80000000? Check out Chapter 7: Memory, Paging, and The Hardware Illusion to uncover the secrets of Virtual Memory.

Next: RV32 ABI + C types we’ll connect C-level data layouts (sizes, alignment, structs) to the exact loads/stores you see in assembly.

ELF Internals and Binutils: Seeing What the Compiler Produced

1. TL;DR

2. Prerequisites

3. ELF in one diagram

4. The “core” tools and what each is for

4.1. readelf (structure)

4.2. objdump (content)

4.3. nm (symbols)

4.4. objcopy (transform)

4.5. xxd / hexdump (raw bytes)

5. Hands-on: inspect a bare-metal sample ELF

5.1. Build a sample ELF

5.2. Show the ELF header

5.3. Show sections

5.4. Show segments (program headers)

6. Where is main? (symbols)

6.1. Fast: nm

6.2. Richer: readelf -s

7. Disassembly you can trust

7.1. Basic disassembly

7.2. Prefer: numeric registers + no pseudo-instruction aliases

7.3. Find a function in disassembly

8. Match instructions to bytes (hexdump workflow)

8.1. Step A: find .text mapping

8.2. Step B: compute a file offset

8.3. Step C: view bytes

9. Relocations: “addresses not final yet”

9.1. The basic idea (with a mental model)

9.2. Anatomy of a relocation entry

9.3. See relocations in a real .o

9.4. What happens after linking?

10. ELF → raw binary (and why addresses disappear)

11. Exercises

11.1. How to test your answers

12. Summary

12.1. Read Next

9.3. See relocations in a real `.o`