Part 1 — Reverse Engineering Fundamentals

Introduction

Reverse engineering is the process of analyzing compiled software to understand how it works without access to the original source code.

Security researchers use reverse engineering to:

- analyze malware - discover vulnerabilities - understand proprietary software - bypass protections - develop exploits

Instead of reading C or Python code, reverse engineers study machine instructions and memory behavior.

This article introduces the foundations required before learning binary exploitation.

What is a Binary?

A binary is a compiled program containing machine instructions executed directly by the CPU.

Most Linux programs use the ELF format (Executable and Linkable Format).

Check binary type:

file program

-------------------------------------------------------------------------

##Example output:

program: ELF 64-bit LSB executable, x86-64

This tells us:

  • architecture → x86–64
  • format → ELF
  • endian → little-endian

ELF Structure

ELF binaries contain multiple sections storing program data.

Important sections:

.text → executable instructions
.data → initialized global variables
.bss → uninitialized global variables
.plt → procedure linkage table
.got → global offset table
.rodata → read-only data

View sections using:

readelf -S program

Static vs Dynamic Analysis

Reverse engineering uses two main approaches.

Static Analysis

Static analysis examines the binary without executing it.

Common tools:

strings
objdump
readelf
binwalk
Ghidra
IDA
Binary Ninja

--------------------------------------------------------------------

## Example:

strings binary

--------------------------------------------------------------------

## Example output:

Enter password
Login failed
admin access

Dynamic Analysis

Dynamic analysis observes the program while it runs.

Tools include:

gdb
pwndbg
gef
strace
ltrace

---------------------------------------------------------------------

## Example:

gdb ./binary

Machine Instructions

Computers execute machine code, not C.

Example machine instruction:

48 89 e5

Disassembled instruction:

mov rbp, rsp

Machine instructions are stored as bytes in memory.

Instruction size varies:

1 byte → simple instruction
10+ bytes → complex instruction

Number Systems in Reverse Engineering

Binary

Binary uses base-2 numbers.

0 1

Example:

10101010

8 bits = 1 byte

Hexadecimal

Hex uses base-16 numbers.

0 1 2 3 4 5 6 7 8 9 A B C D E F

Example:

0x41

Binary equivalent:

01000001

Units

1 bit   = binary digit
1 nibble = 4 bits
1 byte   = 8 bits

Example:

0x41 = 01000001

Endianness

Endianness describes how multi-byte values are stored in memory.

Big Endian

Most significant byte stored first.

Example:

0x12345678

Memory:

12 34 56 78

Little Endian

Least significant byte stored first.

Example:

0x12345678

Memory:

78 56 34 12

Modern x86 systems use little endian.

CPU Registers

Registers are small storage locations inside the CPU.

Common registers:

RAX → return values / arithmetic
RBX → base register / preserved register
RCX → loop counter
RDX → data register / division remainder
RSI → source pointer
RDI → first function argument
RBP → stack frame base pointer
RSP → stack pointer
RIP → instruction pointer

Sub-Registers

Registers can be accessed partially.

Example register:

RAX

Sub-registers:

EAX → lower 32 bits
AX  → lower 16 bits
AH  → high 8 bits
AL  → low 8 bits

Example value:

RAX = 0x1122334455667788

Breakdown:

EAX = 0x55667788
AX  = 0x7788
AH  = 0x77
AL  = 0x88

Key idea

A 64-bit register can be viewed as:

  • 1 × 64-bit value
  • 2 × 32-bit halves
  • 4 × 16-bit parts
  • 8 × 8-bit bytes

let's expand what each register does:

RAX — Accumulator Register

RAX is primarily used for: - storing return values from functions - arithmetic operations - system call numbers Example:

mov rax, 5
add rax, 3

Result:

rax = 8

For system calls:

rax → syscall number

Example:

rax = 60 → exit syscall

— RBX — Base Register

RBX is a general-purpose register often used for:

  • storing base addresses
  • holding values across function calls

Unlike some registers, RBX is preserved across function calls.

Example:

mov rbx, 0x400000

— RCX — Counter Register

RCX is commonly used as a loop counter.

Example:

mov rcx, 10
loop_start:
dec rcx
jnz loop_start

RCX is also used in some string instructions.

— RDX — Data Register

RDX is commonly used for:

  • multiplication and division operations
  • passing function arguments
  • storing remainder values in division

Example division:

mov rax, 10
mov rdx, 0
div rcx

Result:

rax → quotient
rdx → remainder

— RSI — Source Index

RSI is used as a source pointer for memory operations.

Common usage:

  • pointer to input data
  • source buffer in copy operations

Example:

mov rsi, buffer

— RDI — Destination Index

RDI is commonly used as a destination pointer.

In the 64-bit calling convention, RDI holds the first function argument.

Example:

mov rdi, message
call puts

Here rdi stores the pointer to the string.

— RBP — Base Pointer

RBP is used to reference stack frame locations.

It helps access:

  • local variables
  • function arguments

Typical function prologue:

push rbp
mov rbp, rsp

Example stack access:

[rbp+8]  → return address
[rbp-4]  → local variable

— RSP — Stack Pointer

RSP always points to the top of the stack.

Stack operations modify RSP automatically.

Example:

push rax

Effect:

rsp decreases
value stored on stack

Example:

pop rax

Effect:

value removed from stack
rsp increases

— RIP — Instruction Pointer

RIP stores the address of the next instruction to execute.

Example:

rip = 0x400540

The CPU fetches the instruction located at that address.

Control flow instructions modify RIP.

Examples:

jmp address
call function
ret

User Mode vs Kernel Mode

Operating systems use two execution modes.

User Mode

Normal programs run in user mode.

Restrictions:

  • cannot access hardware directly
  • cannot access kernel memory

Kernel Mode

Kernel mode has full system privileges.

Only the operating system kernel runs here.

System Calls

Programs request kernel services using system calls.

Examples:

read
write
open
execve

Example syscall instruction:

syscall

Registers used:

RAX → syscall number
RDI → arg1
RSI → arg2
RDX → arg3

Parser Differential Attacks (Concept)

Example:

Web Application Firewall → parses input one way
Web Server → parses it differently

A parser differential happens when two different components interpret the same input differently.

Examples:

  • frontend parser vs backend parser
  • WAF parser vs application parser
  • URL normalizer vs server
  • JSON/XML/form-data parsing differences

This can create security issues because one layer may think input is safe while another layer interprets it dangerously.

Simple idea

If:

  • Filter A sees the payload as harmless
  • Target B parses the same payload differently

Then the attacker may bypass validation or trigger unexpected behavior.

This idea shows up in:

  • request smuggling
  • URL confusion
  • encoding tricks
  • deserialization edge cases
  • content-type inconsistencies

Binary Exploitation Workflow

Typical exploitation workflow:

Analyze Binary
 ↓
Find Vulnerability
 ↓
Control Instruction Pointer
 ↓
Leak Memory
 ↓
Bypass Protections
 ↓
Build Exploit
 ↓
Gain Shell

Key Takeaways

In this article we introduced:

  • binary structure
  • assembly basics
  • registers
  • machine instructions
  • number systems
  • endianness
  • system calls
  • analysis techniques

These concepts form the foundation of reverse engineering and binary exploitation.

Check PART-2 after reading this article This series contain 5 PARTS,

In Part 2, we will cover:

  • memory addressing (rbp, rsp, offsets)
  • stack behavior (push / pop)
  • instruction tracing
  • function prologue and epilogue
  • 32-bit vs 64-bit calling conventions
  • finding main() in stripped binaries