Cracking Open the Black Box: A Practical Guide to IoT Firmware Analysis

Your router, your smart plug, your IP camera — they're all running code someone hoped you'd never read. Here's how to read it anyway.

Anindya Sankar Roy

~9 min read · April 19, 2026 (Updated: April 19, 2026) · Free: No

Every one of these devices is a small computer running firmware someone hoped you wouldn't audit.

Why IoT Is Still a Goldmine

Web apps get patched in hours. Mobile apps get reviewed by stores. IoT devices? They ship with a Linux kernel from 2016, a BusyBox from before your last haircut, and an update mechanism that nobody at the vendor remembers how to trigger.

That's the opportunity. The same class of bugs that got wiped out of mainstream web stacks years ago — hardcoded credentials, command injection in CGI scripts, unauthenticated admin endpoints, static encryption keys baked into the binary — are alive and thriving in the firmware of devices sitting in millions of homes and offices.

If you've done web and API bounty work, firmware analysis will feel familiar once you get past the unpacking step. The bug classes are almost the same. You just have to learn to open the box first.

This is a field guide to that unpacking. It covers the tools, the workflow, and the specific things I look for when a new firmware image lands on my desk.

The Workflow, Top to Bottom

Every firmware assessment I do follows roughly the same shape:

Acquire the firmware image
Identify what it actually is (header, compression, filesystem type)
Extract the filesystem and binaries
Enumerate what's inside — services, binaries, configs, credentials
Emulate or analyze statically to find exploitable behavior
Validate findings against a real device if you have one

Most public writeups skip straight to step 5. The interesting vulnerabilities often live in step 4.

Step 1: Getting the Firmware

Before you analyze anything, you need the image. Options, in order of how much hardware you want to touch:

Vendor download. Support portals, FTP servers, and CDN endpoints. Often the full image is there, sometimes only a delta update. Check robots.txt, predictable URL patterns (firmware_v1.2.3.bin → try v1.2.4), and archived versions on the Wayback Machine.
Update traffic capture. Proxy the device through Burp or mitmproxy. Many devices still pull updates over HTTP, or HTTPS with no certificate pinning.
UART/serial console. Find the debug header on the PCB, attach a USB-TTL adapter, and you often land in a U-Boot prompt or root shell.
Flash chip dump. Desolder (or clip onto, with a SOIC-8 clip) the SPI flash and read it with a CH341A programmer or Bus Pirate. Slower, but works on devices that lock down everything else.

For a first pass, start with the vendor-supplied download. Hardware teardowns are fun but the vendor almost always gives you enough to get started.

A typical IoT PCB. The small 8-pin chip next to the SoC is usually the SPI flash — where the firmware lives. The unpopulated row of pads near the edge is almost always a UART debug header.

Step 2: Identification

Don't trust the file extension. A .bin file could be anything.

file firmware.bin
xxd firmware.bin | head -20
strings firmware.bin | head -40

file will sometimes identify custom headers ("u-boot legacy uImage", "Squashfs filesystem"). xxd and strings give you the magic bytes and any human-readable header strings — often the vendor name, a version tag, and occasionally build paths that leak developer usernames.

Then fire up binwalk for a proper scan:

binwalk firmware.bin

What you're looking for in the output:

A header (TRX, uImage, DLOB, proprietary vendor magic)
One or more compressed blocks (gzip, LZMA, XZ, LZ4)
A filesystem (SquashFS by a long margin, then JFFS2, CramFS, UBI, YAFFS2)
Sometimes a second kernel + rootfs pair (dual-bank / A/B firmware)

A clean binwalk output looks something like this:

DECIMAL       HEXADECIMAL     DESCRIPTION
0             0x0             uImage header, image size: 1789234, ...
64            0x40            LZMA compressed data
1789312       0x1B4000        Squashfs filesystem, little endian, version 4.0

A real binwalk scan. Each row is a signature match — the offsets on the left tell you exactly where to carve.

That's the easy case. Vendors who've read a blog post and panicked will do things like XOR the first 256 bytes with a static key, prepend a fake header, or split the filesystem across non-contiguous offsets. Those all have tells — entropy plots make them obvious.

binwalk -E firmware.bin

Entropy plot of a healthy firmware image. Low-entropy regions are header/config, the plateau on the right is the compressed filesystem. A uniformly high line across the whole file means encryption.

A healthy firmware has three distinct entropy zones: low (header and config), medium (kernel), high and flat (compressed filesystem). A firmware that's encrypted end-to-end will be uniformly high-entropy with no structure, and that's its own diagnostic signal.

Step 3: Extraction

For anything non-exotic:

binwalk -Me firmware.bin

The -M flag tells binwalk to recurse — it'll extract, then re-scan each extracted blob, then extract again, until it stops finding things. The output lands in _firmware.bin.extracted/.

Modern binwalk (v3, written in Rust) is meaningfully faster and fixes a pile of bugs from the Python v2 branch. If you're still on v2, consider upgrading — or supplementing with unblob, which is often better at handling obscure or mangled headers.

unblob firmware.bin -o extracted/

When binwalk and unblob both fail, you fall back to manual surgery with dd:

# Extract the squashfs starting at offset 0x1B4000
dd if=firmware.bin of=rootfs.squashfs bs=1 skip=$((0x1B4000))
unsquashfs rootfs.squashfs

For JFFS2:

jefferson rootfs.jffs2 -d extracted/

For UBI:

ubireader_extract_files firmware.ubi -o extracted/

You now have a filesystem on disk. This is where the real work starts.

Step 4: The Filesystem Walk

This is where bugs live. Treat the extracted filesystem like you've just landed a shell on a Linux box you've never seen before.

Configuration and credentials

cd squashfs-root/
cat etc/passwd etc/shadow
cat etc/inittab
ls etc/init.d/
ls etc/rc.d/
find . -name "*.conf" -o -name "*.cfg" -o -name "*.ini" 2>/dev/null

/etc/shadow with a hash instead of * or x is a gift. Crack it with hashcat -m 1800 (sha512crypt) or -m 500 (md5crypt) against a decent wordlist. Vendor root passwords are reused across entire product lines more often than you'd think — finding admin:$1$... in one device sometimes unlocks a shell on fifteen others.

Keys and certificates

find . -name "*.pem" -o -name "*.key" -o -name "*.crt" -o -name "*.p12"
find . -name "authorized_keys"

Private keys shipped in firmware are how you end up with CVEs like the infamous "we used the same TLS key for every router in the fleet." Always check. Also grep for them directly:

grep -rE "BEGIN (RSA |EC |DSA |OPENSSH )?PRIVATE KEY" .

Interesting strings in binaries

find . -type f -executable -exec file {} \; | grep ELF

Focus on binaries that are (a) unique to this vendor — not stock busybox/dropbear — and (b) involved in network-facing services. Common suspects:

httpd, mini_httpd, lighttpd, boa, or a custom web server in /usr/sbin/ or /bin/
upnpd, miniupnpd
telnetd, dropbear
Anything in /www/cgi-bin/ — these CGI binaries are usually the web UI backend
Anything with vendor-specific names like acsd, cfm, rc, or proprietary daemon names

For each interesting binary:

strings -n 8 bin/httpd | grep -iE "password|token|secret|admin|debug|/tmp/"

You're hunting for format strings that hint at command execution (system(, popen(, exec), hardcoded paths that suggest debugging backdoors, and credentials/tokens the developer left in a printf.

The CGI layer

If there's a cgi-bin, that's usually the highest-value target. The web UI's authentication logic, parameter parsing, and command dispatch all live there. Command injection in a CGI binary that parses a query string with sprintf into a system() call is a canonical IoT bug and has not gone extinct.

Pull CGI binaries into Ghidra or IDA and look specifically for:

Ghidra's decompiler turning raw MIPS or ARM assembly into readable pseudo-C. This is where you'll spend most of your time once the filesystem is unpacked.

Calls to system, popen, execve, execl where any argument is influenced by user input
sprintf or strcpy into a fixed-size stack buffer from a request parameter (stack overflow)
Authentication checks implemented as strcmp against a hardcoded string, or as if (token) without actually validating the token
Debug endpoints gated by a URL parameter like ?debug=1 or ?test=1

Hardcoded URLs and endpoints

grep -rE "https?://[a-zA-Z0-9./?=_-]+" . 2>/dev/null | sort -u

Cloud backend URLs, update servers, telemetry endpoints. Each one is a new attack surface — and if any of them are HTTP, or HTTPS with broken certificate validation in the client, you have a traffic manipulation primitive.

Step 5: Emulation

Static analysis takes you far. Dynamic analysis takes you further. You want to actually run the vendor binaries.

Full-system emulation of an IoT device is hard — the binaries expect specific hardware, NVRAM contents, and sibling daemons. Firmadyne and its successor FirmAE automate enough of this to get a surprising number of routers booting in QEMU with a functioning web UI.

./run.sh -r firmware_id firmware.bin

When that works, you get a NATed VM running the real firmware, reachable on your host — at which point you've converted a firmware analysis problem into a regular web/network pentest, and every tool in your existing kit applies.

When full-system emulation doesn't work, user-mode QEMU still lets you run individual binaries:

# For a MIPS binary
cp $(which qemu-mipsel-static) squashfs-root/usr/bin/qemu
chroot squashfs-root ./qemu bin/httpd

This is scruffy and half the syscalls will fail, but it's enough to fuzz a single CGI binary or step through a suspected vulnerable path in a debugger.

For serious work, bridge QEMU to GDB and attach Ghidra's debugger, or load the binary in Ghidra statically and use the decompiler to reason about control flow without running anything. Free, powerful, and — for most IoT CPUs (MIPS, ARM, PowerPC) — better supported than the commercial alternatives.

Step 6: Quick Wins Checklist

Before you go deep on any single binary, run through this triage list on every firmware you open. Most of these take under ten minutes and find something perhaps 30% of the time.

/etc/shadow — crackable hashes?
/etc/passwd — any uid-0 accounts beyond root?
Private keys (TLS, SSH, signing) shipped in the image?
Telnet, FTP, or debug services enabled by default in init scripts?
Hardcoded WPA/admin passwords in config templates?
Any binary with setuid bit set that isn't from busybox?
cgi-bin binaries — any that call system() with user input?
Update mechanism — does it verify signatures, or just check a URL?
Any backdoor, debug, test, or factory strings in binaries that look like hidden endpoints?
Vendor cloud URLs over plain HTTP?

The Tools, Summarized

binwalk — first-pass identification and extraction. Use v3 if you can.
unblob — second opinion when binwalk falters.
unsquashfs / jefferson / ubireader — filesystem-specific unpackers.
Ghidra — free, capable disassembler and decompiler. IDA if you have the license.
QEMU — user-mode and system emulation.
Firmadyne / FirmAE — automated full-system firmware emulation.
hashcat / john — password hash cracking.
radare2 / rizin — scriptable reverse engineering when you need to automate.
Frida — instrumentation on devices where you have shell access.

A Word on Scope

Firmware analysis sits in a legal grey zone that varies by jurisdiction. A few principles that have kept me out of trouble:

Analyzing firmware you legally acquired, for your own understanding or for a bug bounty program that explicitly permits it, is usually fine.
Publishing working exploits for unpatched devices in deployment is not fine, regardless of how you feel about the vendor.
Responsible disclosure still applies. Vendor response times for IoT are brutal — 90 days is often not enough — but the process matters.

Know your program's scope. Know your country's laws. When in doubt, ask.

Closing

The thing I want you to take away from this is that firmware analysis isn't a separate, exotic discipline. It's web and API hacking with two extra steps at the front: get the code out of the binary, then read it like source. Once you've done that a few times, the shape of the work is identical to everything else.

The devices are out there. The bugs are out there. Someone's going to find them — might as well be you.

If you found this useful, follow me for more deep dives on offensive security, bug bounty methodology, and reverse engineering. Questions and corrections welcome in the responses.

#iot-security #bug-bounty #security #cyber-security-tutorial #infosec

< Go to the original