Firmware Triage and Reverse Engineering Workflow
TL;DR
- You’ll learn a practical workflow to go from an unknown firmware file (
.bin,.img,.fw, sometimes.elf) to a structured understanding:- identify file type and architecture,
- locate code/data boundaries,
- recover load addresses and entry points,
- and choose the right next tools (disassembly, decompilation, emulation, or QEMU tracing).
- You’ll practice repeatable triage steps that work well for embedded targets (including RISC-V).
1. Firmware file types: what you might get
Common formats
- ELF: best case (symbols, sections, entry point may exist)
- Raw binary (
.bin): flat bytes, no addresses - Container images: may embed file systems or multiple partitions (e.g., update bundles)
- Compressed blobs: LZMA, gzip, etc.
A core reality
A raw binary does not tell you:
- where it loads in memory
- where execution starts
- what architecture it is
You must infer these from context.
2. The triage checklist (do this every time)
Step 1: Identify the file type
| |
If it’s ELF, you’re in a much easier situation.
Step 2: Quick entropy/structure sense
| |
Look for:
- ASCII strings (boot messages, paths, version)
- magic bytes (e.g.,
7f 45 4c 46for ELF) - long runs of
00orff(often padding/erased flash)
strings returns lots of readable paths like /etc/ or /bin/, you may be looking at an embedded Linux filesystem image.Step 3: Search for signatures
Even without specialized tools you can search for patterns:
| |
This tells you if an ELF is embedded inside a larger blob.
Step 4: If ELF: extract structure immediately
| |
Key questions:
- Is it ELF32 or ELF64?
- Machine = RISC-V?
- What is the entry point?
- Which segments are loadable (
PT_LOAD)?
3. If you have a .bin: how to recover likely load address
Strategy A: From the platform memory map
If you know the target memory map (for example, QEMU virt), you often know typical RAM/flash addresses.
- Many RV32 bare-metal examples start around
0x80000000for RAM on QEMU virt. - Real SoCs vary wildly-use datasheets or boot logs.
Strategy B: From vector tables / reset patterns
On some architectures, the reset vector has a recognizable structure. On RISC-V, boot code often begins with a small prologue and jumps; patterns are less standardized than ARM vector tables, but you can still hunt for:
- plausible prologue sequences
- references to known MMIO regions
Strategy C: From absolute addresses in code
If the firmware includes absolute addresses (MMIO registers, RAM ranges), those addresses can reveal the platform.
- Scan for aligned 32-bit values that look like addresses (e.g., high bits consistent)
4. A practical “first disassembly” approach (without committing too early)
Even without a GUI tool, you can do a sanity disassembly pass if you know the architecture.
Example: disassemble a raw binary as RV32
If you have GNU binutils that support RISC-V:
| |
If the output is mostly illegal/garbage instructions, your assumptions might be wrong:
- wrong arch (rv64 vs rv32)
- wrong endianness (rare for RISC-V)
- wrong base address assumptions (for relative branches, this matters)
objdump -b binary does not know the correct load address. Disassembly is “addressed” from 0 unless you compensate in your analysis tooling.5. Carving: extracting sub-images from a blob
If you find an embedded ELF at offset O, extract it:
| |
If it’s a real ELF, you can now use all Chapter 2 methods.
If you find a filesystem or compression signature, you may need specialized tools (common in firmware work), but the workflow stays:
- identify
- extract
- validate
6. Turning findings into a map (the most underrated skill)
Create a simple analysis note like:
| |
This makes your work reproducible and easier to share.
7. Minimal “firmware-style” practice lab (using your own sample)
- Take
build/ld_demo.elf(from Chapter 7) and convert it to.bin. - Pretend you don’t know what it is.
- Use only
file,hexdump,strings, andobjdump -b binaryto identify it. - Write down your best guess about:
- architecture,
- load address,
- what the code does.
Then compare with the truth using readelf on the original ELF.
Exercises
- Embed an ELF into a larger blob (e.g., by concatenating with padding) and practice carving it out using
grep -aobanddd. - Create a raw binary with a known base address assumption (e.g., your linker origin) and see how your disassembly changes if you assume the wrong base.
- Pick 5 strings from a firmware image and write hypotheses about what subsystems they relate to.
How to test your answers
- You can produce a short “analysis map” that someone else could follow.
- Your extracted sub-images validate with
fileandreadelf(when applicable).
Summary
You learned a repeatable firmware triage workflow: identify → extract → validate → map → choose next analysis step.
Next: dynamic analysis with Frida (Dynamic Instrumentation Toolkit)-when it applies to IoT/firmware, what constraints exist, and how to do safe, reproducible hooking experiments.