The Debugger: A Behind-the-Scenes Look at How It Works

Let's try to understand how debuggers work at a technical level and their inner workings

Ankit Dwivedi

~8 min read · September 2, 2023 (Updated: September 2, 2023) · Free: Yes

The Crucial Role of Debuggers

Debugging is the unsung hero of software development, silently working behind the scenes to ensure that your code functions as intended. At its core, debugging is the process of finding and fixing issues in your code. It's an essential skill for any programmer, and understanding how debuggers work at a technical level can significantly enhance your proficiency in this craft.

Debuggers are the tools that developers use to inspect, control, and understand the behaviour of their code during execution. But how do they do this? It all begins with the technical intricacies of the compilation process and the generation of debug symbols.

In the sections that follow, we'll delve deeper into debugging information, explore how breakpoints and trapping work, and understand the process creation and attachment mechanisms that debuggers use. By the end, you'll have a better understanding of the technical underpinnings of these powerful tools and how they seamlessly bridge the gap between source code and machine code.

Debugger workflow

Compiling and Debugging

Let's dive into the nuts and bolts of how debugging teams up with the compilation process.

"Debug Symbols"

The programming language constructs referred to are things like if statements, while loops, assignment statements, etc etc.

Symbol tables may not be created by default — the compiler must be told to create a "debug" version with a symbol table (the "-g" option for the GCC compiler)

Let's take a look at how Debug Symbol information is represented on disk in raw binary

Debug symbols are usually files that map addresses of executable chunks of machine bytecode with the original source code file and line number they represent. This is what allows you to do things like put a breakpoint on an if statement and have the machine stop when the execution reaches that particular bit of bytecode.

Normally we compile our programs as:

gcc hello.cc -o hello

Instead of doing this, we need to compile with the -g flag :

gcc -g hello.cc -o hello

However, the inclusion of these debugging symbols enlarges a program or library significantly. To get an idea of the amount of space these symbols occupy, have a look at the following:

Glibc and GCC files (/lib and /usr/lib) with debugging symbols: 87 MB
Glibc and GCC files without debugging symbols: 16 MB

An Example

Let's say you have a line of code like this: int x = 42; in your source code. When you compile with debug symbols, these symbols act like a translator.

Without debug symbols, the debugger might just see a bunch of numbers and letters in the machine code. But with debug symbols, it becomes a breeze to trace back to the original source code. It's like having a map that guides you through the maze of ones and zeros.

Setting Breakpoints

Setting breakpoints is a fundamental aspect of debugging. It's like placing markers in your code, allowing you to pause program execution at specific points and inspect the state of your program. But how do breakpoints work under the hood?

Imagine you're reading a book, and you want to take a closer look at a particular paragraph. You place a bookmark at that spot to return to it easily. In the world of debugging, breakpoints serve a similar purpose.

Behind the Scenes of Breakpoints

Setting Breakpoints: In your Integrated Development Environment (IDE), you indicate where you want to pause your program's execution. Typically, this is done by clicking in the margin next to a line of code or using a specific command in your debugger.
Recording Breakpoint Locations: When you set a breakpoint, the debugger takes note of the line number and file location. It knows precisely where you want to pause execution.
Inserting Breakpoint Traps: When you run your program with debugging enabled, the debugger inserts a special instruction, often referred to as a breakpoint trap, at the designated line. For x86-based architectures (such as Intel and AMD processors), the commonly used breakpoint trap is the int3 instruction (interrupt 3).
Interrupts and Exceptions: The breakpoint trap, like int3, is an instruction that is not part of your program's normal logic. When the processor encounters it, it generates a software interrupt or exception.
Control Passed to Debugger: When the exception is triggered, the operating system hands control over to the debugger. The debugger's code(often referred to as a debugger handler) is designed to handle this situation. At this point, the debugger knows that your program has hit a breakpoint, and it stops the program's execution.

Breakpoints are an indispensable tool for understanding the behaviour of your code. They allow you to inspect variables, step through your program line by line, and identify issues efficiently. Now, let's move on to the next section, where we'll explore the concept of stepping through code.

Stepping Through Code

Stepping through code is a fundamental feature of debuggers, enabling developers to execute their programs one step at a time.

Continue, Step Over, Step Into and Step Out actions

Types of Stepping Actions

Step Over: When you "Step Over," you instruct the debugger to skip the intricate details of the function. It executes the entire function but doesn't pause at each line.
Step Into: "Step into" action takes you deeper into the code. The debugger, aided by mechanisms like ptrace, takes your program on a guided tour inside the function.
Step Forward: The debugger, issues a command to your program, saying, "Execute just one more line, please."

Whether you're stepping forward, diving into functions, or stepping over them, you have a trusty guide in your debugger, thanks to mechanisms like ptrace

The Technical Process of Stepping

Instruction Pointer: The debugger keeps track of the program's execution using a special register called the instruction pointer (IP). This register points to the next instruction to be executed.
Executing Instructions: When you perform a stepping action, the debugger instructs the program to execute one instruction at a time.
Examining State: After executing an instruction, the debugger examines the program's state, including the values of variables and registers. It also updates the source code view to highlight the line that will be executed next.
User Interaction: Throughout this process, you, the developer, can inspect variable values, check the call stack, and evaluate expressions to gain insights into the program's behaviour.
Control Flow: Depending on the stepping action you choose, the debugger will continue to execute instructions until it reaches the next line in the same function (step over), enters a new function (step into), or exits the current function (step out).

Variable Inspection

Variable inspection is a critical aspect of debugging, allowing you to peer into your program's memory and understand the state of your variables. But how does it work? Let's try to understand.

Source: https://www.jetbrains.com/help/rider/Inspecting_Variables.html#search-through-values-of-complex-objects

Reading Memory

Debuggers are like detectives when it comes to memory inspection. They utilize system calls or operating system-provided APIs to read data from the address space of the target process.

Specifying the Address: The debugger specifies the memory address it's interested in, utilizing the information from debug symbols. It knows the exact location of each variable.
The OS's Role: Here's where the operating system enters the scene. It retrieves the requested data from the process's memory and hands it over to the debugger.

Writing Memory

Debuggers are not just observers, they have the power to modify the memory of the target process. This means you can tweak the values of variables or data structures while the program is temporarily paused.

Making Changes: Similar to reading, the debugger employs system calls or APIs to write data to a specific memory address within the process's address space.

The Role of Symbol Information and Debugging Symbols

Remember those debug symbols generated during compilation?

Mapping Variables: When you're inspecting a variable's value, the debugger relies on these symbols to map the variable's name to its memory location.
Function Insights: Debug symbols also shed light on function names and their memory addresses, helping the debugger navigate the program's execution.

Access Control And Permission Assurance

But can anyone start reading data related to your code(under execution)? To prevent unauthorized access to sensitive memory regions, operating systems implement access control mechanisms. These mechanisms ensure that debuggers have the necessary permissions to read and modify memory in the target process, safeguarding against unauthorized intrusions.

So, the next time you inspect memory or make a tweak while debugging, remember that you're not just navigating data, you're unlocking the secrets of your program's innermost workings, one memory address at a time.

Real-time Expression Evaluation

Real-time expression evaluation is a powerful feature of debuggers that lets you analyze and diagnose problems in your code by evaluating expressions based on the current program state.

Source: https://blog.jetbrains.com/idea/2023/04/debugger-upskill-variables-evaluate-expression-watches/

This capability empowers you to understand how specific calculations or function calls are affecting your code's behaviour at runtime.

How Debuggers Perform Real-time Expression Evaluation

Expression Parsing: When you input an expression (e.g., a mathematical calculation or a function call) into your debugger's interface, the debugger parses this expression to understand its structure and components.
Context Awareness: The debugger is aware of the current state of your program, including variable values, memory content, and register values. It uses this context to evaluate the expression. For example, if your expression involves variables, the debugger substitutes their current values into the expression.
Expression Evaluation: The debugger employs an expression evaluator module that understands the syntax and semantics of the programming language you're using. This module processes the expression, performs the required calculations, and produces a result.
Displaying Results: The evaluated result is then displayed in the debugger's user interface. This might include numerical values, strings, or any other data type relevant to your expression.
Debugging Insights: Real-time expression evaluation provides you with invaluable insights into your code's behaviour. You can assess whether your calculations are correct, check the outcomes of functions, and make informed decisions based on the results.
Conditional Breakpoints: Real-time expression evaluation also plays a crucial role in conditional breakpoints. These are breakpoints that trigger only when a specified expression evaluates to true or meets a specific condition. This feature enhances your debugging efficiency by allowing you to halt execution precisely when and where it matters.

Real-time expression evaluation is a potent tool for dissecting your code's behaviour, especially when dealing with complex logic or mathematical computations. It empowers you to diagnose issues, fine-tune your algorithms, and make data-driven decisions during debugging.

I hope you got some idea of the inner workings of a debugger and how it interacts with OS to perform various actions.

In case you have any feedback or suggestions leave them in the comments :) You can also connect with me on Linkedin.

References:

http://www.iitk.ac.in/LDP/LDP/lfs/5.0/html/chapter06/aboutdebug.html
https://eli.thegreenplace.net/tag/debuggers
https://opensource.com/article/18/1/how-debuggers-work
CppCon 2018: Simon Brand "How C++ Debuggers Work" (https://youtu.be/0DDrseUomfU?si=7Vl0LFXupS1IMidG)

#debugging #software #design #software-development #software-engineering

< Go to the original