'C → Assembly: Optimizations, Volatile, and What the Compiler Is Allowed to

Author: Marcos Azevedo

Date: 2026-01-20

Last Modified: 2026-01-20

Reading Time: 5 mins

Section: Series

Tags: c-lang programming risc-v

TL;DR

You’ll learn how the compiler transforms C into assembly and why the same C code can look wildly different under -O0 vs -O2.
You’ll build a practical mental model for:
- dead-code elimination,
- common subexpression elimination,
- inlining,
- register allocation,
- and how volatile constrains these optimizations.
You’ll run experiments and verify results with objdump and GDB.

1. The compiler pipeline (why there are multiple “translations”)

flowchart TD
  A["C source (.c)"] --> B["Frontend to IR (Intermediate Representation)"]
  B --> C["Optimizer (depends on -O level)"]
  C --> D["Backend to assembly (.s)"]
  D --> E["Assembler to object (.o)"]
  E --> F["Linker to ELF (.elf)"]

Two consequences:

“The compiler” isn’t one step; it’s many stages.
-O changes the optimizer stage, which changes everything downstream.

Important

If you don’t understand optimization, you’ll misinterpret disassembly and debugging sessions, especially when variables “vanish” or control flow looks nothing like your C.

2. Hands-on lab: one program, many optimization levels

Create:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
// src/opt.c
#include "types.h"
#include "uart.h"

volatile u32 sink;

u32 f(u32 x) {
  u32 a = x * 3u;
  u32 b = x * 3u;      // same expression as a
  u32 c = a + b;

  if ((c & 1u) == 0u) {
    // looks like it matters...
    c += 10u;
  }

  // store result somewhere observable
  sink = c;
  return c;
}

int main(void) {
  u32 r = f(7u);
  uart_puts("f(7)=");
  uart_puthex32(r);
  uart_putc('\n');
  return 0;
}

Build two variants:

1
2
3
4
5
riscv64-unknown-elf-gcc -g -ffreestanding -nostdlib -march=rv32im -mabi=ilp32 -O0 \
  -T src/link.ld src/start.s src/uart.c src/opt.c -o build/opt_O0.elf

riscv64-unknown-elf-gcc -g -ffreestanding -nostdlib -march=rv32im -mabi=ilp32 -O2 \
  -T src/link.ld src/start.s src/uart.c src/opt.c -o build/opt_O2.elf

Run both:

1
2
3
4
5
// Run, check output and use CTRL+a x to exit;
qemu-system-riscv32 -M virt -nographic -bios none -kernel build/opt_O0.elf

// Run, check output and use CTRL+a x to exit;
qemu-system-riscv32 -M virt -nographic -bios none -kernel build/opt_O2.elf

Disassemble both:

1
2
3
riscv64-unknown-elf-objdump -d -M numeric,no-aliases build/opt_O0.elf | less

riscv64-unknown-elf-objdump -d -M numeric,no-aliases build/opt_O2.elf | less

What to look for

Under -O0:
- more stack usage,
- more loads/stores,
- variables “live” as you expect.
Under -O2:
- a and b are likely computed once,
- branches may be simplified,
- code may be rearranged.

Tip

When learning assembly, start with -O0 and then learn to recognize the optimized forms.

3. Why variables disappear in optimized builds

Register allocation

At -O2, the compiler tries to keep values in registers and may never materialize them in memory.

Lifetime shrinking

If a variable’s value is used only briefly, it may never exist as a named location.

Inlining

Small functions are often replaced by their body.

Note

This is why GDB can show <optimized out> for variables.

4. volatile means “must perform the access”

A volatile object tells the compiler:

every read is a real load,
every write is a real store,
the compiler cannot remove or merge those accesses,
the compiler cannot assume the value stays the same between accesses.

This is critical for:

MMIO (Memory-Mapped I/O) registers,
ISR (Interrupt Service Routine) shared state,
externally-modified memory.

Hands-on: volatile vs non-volatile

Create:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
// src/volatile_demo.c
#include "types.h"
#include "uart.h"

u32 nv_reg;
volatile u32 v_reg;

u32 demo(u32 x) {
  nv_reg = x;
  nv_reg = x;     // might be merged

  v_reg = x;
  v_reg = x;      // must not be merged

  return nv_reg + v_reg;
}

int main(void) {
  u32 r = demo(0x1234u);
  uart_puts("demo=");
  uart_puthex32(r);
  uart_putc('\n');
  return 0;
}

Build optimized and disassemble:

1
2
3
4
5
riscv64-unknown-elf-gcc -g -ffreestanding \ 
  -nostdlib -march=rv32im -mabi=ilp32 -O2 \
  -T src/link.ld src/start.s src/uart.c src/volatile_demo.c -o build/vol_O2.elf

riscv64-unknown-elf-objdump -d -M numeric,no-aliases build/vol_O2.elf | less

What you should observe:

The non-volatile double-store may collapse to one store.
The volatile double-store should remain two stores.

Warning

volatile is not a synchronization primitive. It does not create atomicity, ordering guarantees across cores, or memory barriers. For concurrency you want C11 atomics or explicit fences.

5. Mapping C back to assembly (a practical method)

When you see assembly, ask:

Where are inputs? (usually a0..a7)
Where does the return value go? (usually a0)
Which registers must survive calls? (callee-saved s*)
Which memory stores are observable? (volatile, globals, function calls)

Use compiler-generated assembly as a “bridge”

Generate .s output:

1
2
3
4
5
riscv64-unknown-elf-gcc -S -O0 -ffreestanding \
  -nostdlib -march=rv32im -mabi=ilp32 -o build/opt_O0.s src/opt.c

riscv64-unknown-elf-gcc -S -O2 -ffreestanding \
  -nostdlib -march=rv32im -mabi=ilp32 -o build/opt_O2.s src/opt.c

Compare build/opt_O0.s and build/opt_O2.s.

Tip

The .s file is often easier to read than objdump because it preserves labels and structure.

Exercises

Modify opt.c so sink is not volatile. Predict what changes in -O2.
Add a uart_putc (or any external call) and observe how it “pins” values (calls are optimization barriers).
Write two functions: one tiny, one large. Observe when the tiny one is inlined.

How to test your answers

Use objdump -d -M numeric,no-aliases to compare instruction sequences.
Use readelf -s to see if functions still exist as symbols (inlining can remove the symbol).

Summary

You learned what optimizations do, why debugging optimized code can be confusing, and what volatile truly guarantees.

Next: control flow and data access-you’ll learn how if/loops/switch become branches and jump tables, and how loads/stores encode addressing.