Part 1 — Reverse Engineering Fundamentals
Introduction
Reverse engineering is the process of analyzing compiled software to understand how it works without access to the original source code.
Security researchers use reverse engineering to:
- analyze malware - discover vulnerabilities - understand proprietary software - bypass protections - develop exploits
Instead of reading C or Python code, reverse engineers study machine instructions and memory behavior.
This article introduces the foundations required before learning binary exploitation.
—
What is a Binary?
A binary is a compiled program containing machine instructions executed directly by the CPU.
Most Linux programs use the ELF format (Executable and Linkable Format).
Check binary type:
file program
-------------------------------------------------------------------------
##Example output:
program: ELF 64-bit LSB executable, x86-64This tells us:
- architecture → x86–64
- format → ELF
- endian → little-endian
—
ELF Structure
ELF binaries contain multiple sections storing program data.
Important sections:
.text → executable instructions
.data → initialized global variables
.bss → uninitialized global variables
.plt → procedure linkage table
.got → global offset table
.rodata → read-only dataView sections using:
readelf -S program—
Static vs Dynamic Analysis
Reverse engineering uses two main approaches.
Static Analysis
Static analysis examines the binary without executing it.
Common tools:
strings
objdump
readelf
binwalk
Ghidra
IDA
Binary Ninja
--------------------------------------------------------------------
## Example:
strings binary
--------------------------------------------------------------------
## Example output:
Enter password
Login failed
admin accessDynamic Analysis
Dynamic analysis observes the program while it runs.
Tools include:
gdb
pwndbg
gef
strace
ltrace
---------------------------------------------------------------------
## Example:
gdb ./binary—
Machine Instructions
Computers execute machine code, not C.
Example machine instruction:
48 89 e5Disassembled instruction:
mov rbp, rspMachine instructions are stored as bytes in memory.
Instruction size varies:
1 byte → simple instruction
10+ bytes → complex instruction—
Number Systems in Reverse Engineering
Binary
Binary uses base-2 numbers.
0 1Example:
101010108 bits = 1 byte
—
Hexadecimal
Hex uses base-16 numbers.
0 1 2 3 4 5 6 7 8 9 A B C D E FExample:
0x41Binary equivalent:
01000001—
Units
1 bit = binary digit
1 nibble = 4 bits
1 byte = 8 bitsExample:
0x41 = 01000001—
Endianness
Endianness describes how multi-byte values are stored in memory.
Big Endian
Most significant byte stored first.
Example:
0x12345678Memory:
12 34 56 78Little Endian
Least significant byte stored first.
Example:
0x12345678Memory:
78 56 34 12Modern x86 systems use little endian.
—
CPU Registers
Registers are small storage locations inside the CPU.
Common registers:
RAX → return values / arithmetic
RBX → base register / preserved register
RCX → loop counter
RDX → data register / division remainder
RSI → source pointer
RDI → first function argument
RBP → stack frame base pointer
RSP → stack pointer
RIP → instruction pointerSub-Registers
Registers can be accessed partially.
Example register:
RAXSub-registers:
EAX → lower 32 bits
AX → lower 16 bits
AH → high 8 bits
AL → low 8 bitsExample value:
RAX = 0x1122334455667788Breakdown:
EAX = 0x55667788
AX = 0x7788
AH = 0x77
AL = 0x88Key idea
A 64-bit register can be viewed as:
- 1 × 64-bit value
- 2 × 32-bit halves
- 4 × 16-bit parts
- 8 × 8-bit bytes
—
let's expand what each register does:
RAX — Accumulator Register
RAX is primarily used for: - storing return values from functions - arithmetic operations - system call numbers Example:
mov rax, 5
add rax, 3Result:
rax = 8For system calls:
rax → syscall numberExample:
rax = 60 → exit syscall— RBX — Base Register
RBX is a general-purpose register often used for:
- storing base addresses
- holding values across function calls
Unlike some registers, RBX is preserved across function calls.
Example:
mov rbx, 0x400000— RCX — Counter Register
RCX is commonly used as a loop counter.
Example:
mov rcx, 10
loop_start:
dec rcx
jnz loop_startRCX is also used in some string instructions.
— RDX — Data Register
RDX is commonly used for:
- multiplication and division operations
- passing function arguments
- storing remainder values in division
Example division:
mov rax, 10
mov rdx, 0
div rcxResult:
rax → quotient
rdx → remainder— RSI — Source Index
RSI is used as a source pointer for memory operations.
Common usage:
- pointer to input data
- source buffer in copy operations
Example:
mov rsi, buffer— RDI — Destination Index
RDI is commonly used as a destination pointer.
In the 64-bit calling convention, RDI holds the first function argument.
Example:
mov rdi, message
call putsHere rdi stores the pointer to the string.
— RBP — Base Pointer
RBP is used to reference stack frame locations.
It helps access:
- local variables
- function arguments
Typical function prologue:
push rbp
mov rbp, rspExample stack access:
[rbp+8] → return address
[rbp-4] → local variable— RSP — Stack Pointer
RSP always points to the top of the stack.
Stack operations modify RSP automatically.
Example:
push raxEffect:
rsp decreases
value stored on stackExample:
pop raxEffect:
value removed from stack
rsp increases— RIP — Instruction Pointer
RIP stores the address of the next instruction to execute.
Example:
rip = 0x400540The CPU fetches the instruction located at that address.
Control flow instructions modify RIP.
Examples:
jmp address
call function
ret—
User Mode vs Kernel Mode
Operating systems use two execution modes.
User Mode
Normal programs run in user mode.
Restrictions:
- cannot access hardware directly
- cannot access kernel memory
Kernel Mode
Kernel mode has full system privileges.
Only the operating system kernel runs here.
—
System Calls
Programs request kernel services using system calls.
Examples:
read
write
open
execveExample syscall instruction:
syscallRegisters used:
RAX → syscall number
RDI → arg1
RSI → arg2
RDX → arg3—
Parser Differential Attacks (Concept)
Example:
Web Application Firewall → parses input one way
Web Server → parses it differentlyA parser differential happens when two different components interpret the same input differently.
Examples:
- frontend parser vs backend parser
- WAF parser vs application parser
- URL normalizer vs server
- JSON/XML/form-data parsing differences
This can create security issues because one layer may think input is safe while another layer interprets it dangerously.
Simple idea
If:
- Filter A sees the payload as harmless
- Target B parses the same payload differently
Then the attacker may bypass validation or trigger unexpected behavior.
This idea shows up in:
- request smuggling
- URL confusion
- encoding tricks
- deserialization edge cases
- content-type inconsistencies
—
Binary Exploitation Workflow
Typical exploitation workflow:
Analyze Binary
↓
Find Vulnerability
↓
Control Instruction Pointer
↓
Leak Memory
↓
Bypass Protections
↓
Build Exploit
↓
Gain Shell—
Key Takeaways
In this article we introduced:
- binary structure
- assembly basics
- registers
- machine instructions
- number systems
- endianness
- system calls
- analysis techniques
These concepts form the foundation of reverse engineering and binary exploitation.
—
Check PART-2 after reading this article This series contain 5 PARTS,
In Part 2, we will cover:
- memory addressing (
rbp,rsp, offsets) - stack behavior (
push/pop) - instruction tracing
- function prologue and epilogue
- 32-bit vs 64-bit calling conventions
- finding
main()in stripped binaries