ELF Internals and Binutils: Seeing What the Compiler Produced

1. TL;DR

If you can read ELF structure confidently, reverse engineering and debugging become dramatically easier. You stop guessing!


2. Prerequisites

3. ELF in one diagram

Think of an ELF as having two different “views” of the same data:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
ELF file
 ├─ ELF header
 ├─ Program header table  (segments: loader view)
 │    ├─ PT_LOAD (text)
 │    ├─ PT_LOAD (data)
 │    └─ ...
 ├─ Section header table  (sections: linker view)
 │    ├─ .text
 │    ├─ .rodata
 │    ├─ .data
 │    ├─ .bss
 │    ├─ .symtab / .strtab
 │    └─ ...
 └─ Raw section contents

4. The “core” tools and what each is for

4.1. readelf (structure)

4.2. objdump (content)

4.3. nm (symbols)

4.4. objcopy (transform)

4.5. xxd / hexdump (raw bytes)

5. Hands-on: inspect a bare-metal sample ELF

5.1. Build a sample ELF

We will inspect a small bare-metal program that writes over UART:

1
2
3
riscv64-unknown-elf-gcc -O0 -g -ffreestanding -nostdlib \
  -march=rv32im -mabi=ilp32 -T src/link.ld \
  src/start.s src/uart.c src/lab.c -o build/lab_rv32.elf

5.2. Show the ELF header

1
readelf -h build/lab_rv32.elf

Look for:

5.3. Show sections

1
readelf -S build/lab_rv32.elf

Key fields to understand:

5.4. Show segments (program headers)

1
readelf -l build/lab_rv32.elf

In the segment list, focus on:

6. Where is main? (symbols)

6.1. Fast: nm

1
nm -n build/lab_rv32.elf | grep -E ' main$| add_u32$'

6.2. Richer: readelf -s

1
readelf -s build/lab_rv32.elf | grep -E ' main$| add_u32$| mmio_fake$'

You’ll see:

7. Disassembly you can trust

7.1. Basic disassembly

1
riscv64-unknown-elf-objdump -d build/lab_rv32.elf | less

7.2. Prefer: numeric registers + no pseudo-instruction aliases

Pseudo-instructions can hide what the CPU actually executes.

1
riscv64-unknown-elf-objdump -d -M numeric,no-aliases build/lab_rv32.elf | less

7.3. Find a function in disassembly

1
grep -n "<add_u32>" -n <(riscv64-unknown-elf-objdump -d -M numeric,no-aliases build/lab_rv32.elf)

8. Match instructions to bytes (hexdump workflow)

This is a practical reverse engineering skill:

  1. Identify an instruction address in objdump (e.g. main).
  2. Convert that address → file offset using section info.
  3. Inspect raw bytes at that offset.

8.1. Step A: find .text mapping

You want .text Addr and Off.

1
readelf -S build/lab_rv32.elf | rg -n '\.text'

Output:

1
6:  [ 1] .text   PROGBITS   80000000 001000 000344 00  AX  0   0  4

8.2. Step B: compute a file offset

First, get the address of main:

1
nm -n build/lab_rv32.elf | rg ' main$'

Output:

1
80000280 T main

On this ELF, main is at 0x80000280.
From the section headers, .text has:

Then:

1
2
offset = 0x001000 + (0x80000280 - 0x80000000)
       = 0x001280

Or let bc do the math:

1
2
bc -q <<< 'obase=16; ibase=16; 001000 + (80000280 - 80000000)'
1280

Notes on bc:

8.3. Step C: view bytes

  1. Dump .text bytes with objdump:
1
riscv64-unknown-elf-objdump -s -j .text build/lab_rv32.elf | head -n 20

Output:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
build/lab_rv32.elf:     file format elf32-littleriscv

Contents of section .text:
 80000000 17010001 13010100 93810100 97020000  ................
 80000010 93820235 13834180 63886200 23a00200  ...5..A.c.b.#...
 80000020 93824200 e3cc62fe ef008025 73005010  ..B...b....%s.P.
 80000030 6ff0dfff 130101fd 23268102 13040103  o.......#&......
 80000040 232ea4fc 8327c4fd 93f7f700 2326f4fe  #....'......#&..
 80000050 0327c4fe 93079000 63ece700 8327c4fe  .'......c....'..
 80000060 93f7f70f 93870703 93f7f70f 6f004001  ............o.@.
 80000070 8327c4fe 93f7f70f 93877705 93f7f70f  .'........w.....
 80000080 a305f4fe b7070010 0347b4fe 2380e700  .........G..#...
 80000090 13000000 0324c102 13010103 67800000  .....$......g...
 800000a0 130101fe 232e8100 13040102 93070500  ....#...........
 800000b0 a307f4fe b7070010 0347f4fe 2380e700  .........G..#...
 800000c0 13000000 0324c101 13010102 67800000  .....$......g...
 800000d0 130101fe 232e1100 232c8100 13040102  ....#...#,......
 800000e0 2326a4fe 6f00c001 8327c4fe 13871700  #&..o....'......
 800000f0 2326e4fe 83c70700 13850700 eff05ffa  #&............_.

In this objdump -s output, the left‑hand address (for example 0x800000d0) is the runtime/VMA address of those bytes when .text is loaded into memory, not a file offset. It’s the section’s load address plus the offset within the section.

  1. To inspect the bytes that correspond specifically to main (at 0x80000280 → file offset 0x1280):
1
xxd -s 0x1280 -g 1 -l 16 build/lab_rv32.elf

Output:

1
00001280: 13 01 01 fe 23 2e 11 00 23 2c 81 00 13 04 01 02  ....#...#,......
  1. What is the equivalent view in the xxd/hexdump?

The objdump -j .text finds the .text section by name in the ELF section table and dumps the bytes that belong to it. In the other hand, xxd and hexdump are section‑agnostic; they just dump raw bytes starting at a file offset. In this ELF, .text begins at file offset 0x1000 (as seen in the section headers), so these commands are equivalent views of the same bytes:

1
xxd -s 0x1000 -g 1 build/lab_rv32.elf | head -n 20
1
hexdump -C -s 0x1000 build/lab_rv32.elf | head -n 20
1
readelf -S build/lab_rv32.elf | rg '\.text'

If the .text offset is different, replace 0x1000 with that value.

9. Relocations: “addresses not final yet”

A relocation is a note from the assembler to the linker that says: “I had to put something in this instruction or data slot, but I don’t know the final address yet. Please fix it later.”

This happens because .o files are built before the linker decides where everything lives in memory.

9.1. The basic idea (with a mental model)

When you write:

The assembler can’t know the final address of foo or global_var.
So it:

  1. Emits a placeholder in the instruction/data,
  2. Adds a relocation entry that describes how to patch it later.

At link time, the linker reads those entries, computes the real addresses, and rewrites the bytes.

9.2. Anatomy of a relocation entry

A relocation usually includes:

9.3. See relocations in a real .o

Build an object file:

1
2
riscv64-unknown-elf-gcc -O0 -g -ffreestanding -nostdlib -march=rv32im -mabi=ilp32 \
  -c src/lab.c -o build/lab.o

Inspect relocation entries:

1
readelf -r build/lab.o

Helpful companion views:

  1. Symbol table (names + addresses in .o)
1
readelf -s build/lab.o
  1. Disassembly + relocations inline
1
riscv64-unknown-elf-objdump -dr build/lab.o

What to look for in readelf -r:

9.4. What happens after linking?

10. ELF → raw binary (and why addresses disappear)

Convert to a flat binary:

1
riscv64-unknown-elf-objcopy -O binary build/lab_rv32.elf lab_rv32.bin

Now check:

1
ls -l build/lab_rv32.elf lab_rv32.bin

Why .bin is smaller:

11. Exercises

  1. Use readelf -h to find the entry point of build/lab_rv32.elf.
  2. Use nm -n to find the address of add_u32 and locate it in objdump.
  3. Pick one instruction inside add_u32 and find the exact bytes in the ELF using the section offset method.
  4. Build lab.o and list relocations; explain in one sentence what each relocation is trying to fix.

11.1. How to test your answers


12. Summary

You learned to navigate ELF structure and use binutils to connect:

flowchart LR
  A[symbols] --> B[disassembly] --> C[raw bytes] --> D[runtime addresses]

In the readelf -l output, you might have noticed Align 0x1000. Why does the hardware care about this number? And does your code really live at 0x80000000? Check out Chapter 7: Memory, Paging, and The Hardware Illusion to uncover the secrets of Virtual Memory.

Next: RV32 ABI + C types we’ll connect C-level data layouts (sizes, alignment, structs) to the exact loads/stores you see in assembly.