This is actually a rather large and complicated topic, and it is also architecture-specific, so I'll only aim in this answer to provide a summary of the common approaches on the Intel (and compatible) x86 microarchitecture.
The good news is, it is language-independent, so the debugger is going to work the same way whether it's debugging VB.NET, C#, or C++ code. The reason why this is true is that all code is ultimately going to compile (whether statically [i.e., ahead-of-time like C++ or with a JIT compiler like .NET]) or dynamically [e.g., via a run-time interpreter]) to object code that can be natively executed by the processor. It is this native code that the debugger ultimately works on.
Furthermore, this isn't limited to Visual Studio. Its debugger certainly works in the way that I'll describe, but so does any other Windows debugger, like the Debugging Tools for Windows debuggers (WinDbg, KD, CDB, NTSD, etc.), GNU's GDB, IDA's debugger, the open-source x64dbg, and so on.
Let's start with a simple definition—what is a breakpoint? It's just a mechanism that allows execution to be paused so that you can conduct further analysis, whether that's examining the call stack, printing the values of variables, modifying the contents of memory or registers, or even modifying the code itself.
On the x86 architecture, there are several fundamental ways that breakpoints can be implemented. They can be divided into the two general categories of software breakpoints and hardware breakpoints.
Although a software breakpoint uses features of the processor itself, it is primarily implemented within software, hence the name. Specifically, interrupt #3 (the assembly language instruction INT 3
) provides a breakpoint interrupt. This can be placed anywhere in the executable code, and when the CPU hits this instruction during execution, it will trap. The debugger can then catch this trap and do whatever it wants to do. If the program is not running under a debugger, then the operating system will handle the trap; the OS's default handler will simply terminate the program.
There are two possible encodings for the INT 3
instruction. Perhaps the most logical encoding is 0xCD 0x03
, where 0xCD
means INT
and 0x03
specifies the "argument", or the number of the interrupt that is to be triggered. However, because breakpoints are so important, the designers at Intel also added a special-case representation for INT 3
—the single-byte opcode 0xCC
.
The nice thing about this being a one-byte instruction is that it can be inserted pretty much anywhere in a program without much difficulty. Conceptually, this is simple, but the way it actually works is somewhat tricky. Basically, there are two options:
If it's a fixed breakpoint, then the debugger can insert this INT
instruction into the code when it is compiled. Then, every time you hit that point, it will execute that instruction and break.
In C/C++, a fixed breakpoint might be inserted via a call to the DebugBreak
API function, with the __debugbreak
intrinsic, or using inline assembly to insert an INT 3
instruction. In .NET code, you would use System.Diagnostics.Debugger.Break
to emit a fixed breakpoint.
At runtime, a fixed breakpoint can be easily removed by replacing the one-byte INT
instruction (0xCC
) with a one-byte NOP
instruction (0x90
). NOP
is the mnemonic for no-op: it just causes the processor to waste a cycle without doing anything.
But if it's a dynamic breakpoint, then things get more complicated. The debugger must modify the binary in-memory and insert the INT
instruction. But where is it going to insert it? Even in a debugging build, a compiler cannot reasonably insert a NOP
between every single instruction, and it doesn't know in advance where you might want to insert a breakpoint, so there won't be space to insert even a one-byte INT
instruction at an arbitrary location in the code.
So what it does instead is insert the INT
instruction (0xCC
) at the requested location, writing over whatever instruction is currently there. If this is a one-byte instruction (such as an INC
), then it is simply replaced by an INT
. If this is a multi-byte instruction (most of them are), then only the first byte of that instruction is replaced by 0xCC
. The original instruction then becomes invalid because it's been partially overwritten. But that's okay, because once the processor hits the INT
instruction, it will trap and stop executing at precisely that point. The partial, corrupted, original instruction will not be hit. Once the debugger catches the trap triggered by the INT
instruction and "breaks" in, it undoes the in-memory modification, replacing the inserted 0xCC
byte with the correct byte representation for the original instruction. That way, when you resume execution from that point, the code is correct and you don't hit the same breakpoint over and over. Note that all of this modification happens to the current image of the binary executable stored in memory; it is patched directly in memory, without ever modifying the file on disk. (This is done using the ReadProcessMemory
and WriteProcessMemory
API functions, specifically designed for debuggers.)
Here it is in machine code, showing both the raw bytes as well as the assembly-language mnemonics:
31 C0 xor eax, eax ; clear EAX register to 0
BA 02 00 00 00 mov edx, 2 ; set EDX register to 2
01 D0 add eax, edx ; add EDX to EAX
C3 ret ; return, with result in EAX
If we were to set a breakpoint on the line of source code that added the values (the ADD
instruction in the disassembly), the first byte of the ADD
instruction (0x01
) would be replaced with 0xCC
, leaving the remaining bytes as meaningless garbage:
31 C0 xor eax, eax ; clear EAX register to 0
BA 02 00 00 00 mov edx, 2 ; set EDX register to 2
CC int 3 ; BREAKPOINT!
D0 ??? ; meaningless garbage, never executed
C3 ret ; also meaningless garbage from CPU's perspective
Hopefully you were able to follow all of that, because that is actually the simplest case. Software breakpoints are what you use most of the time. Many of the most commonly used features of a debugger are implemented using software breakpoints, including stepping over a call, executing all code up to a particular point, and running to the end of a function. Behind the scenes, all of these use a temporary software breakpoint that is automatically removed the first time that it is hit.
However, there is a more complicated and more powerful way to set a breakpoint with the direct assistance of the processor. These are known as hardware breakpoints. The x86 instruction set provides 6 special debug registers. (They are referred to as DB0
through DB7
, suggesting a total of 8, but DR4
and DR5
are the same as DR6
and DR7
, so there are actually only 6.) The first 4 debug registers (DR0
through DR3
) store either a memory address or an I/O location, whose values can be set using a special form of the MOV
instruction. DR6
(equivalent to DR4
) is a status register that contains flags, and DR7
(equivalent to DR5
) is a control register. When the control register is set accordingly, an attempt by the processor to access one of these four locations will cause a hardware breakpoint (specifically, an INT 1
interrupt will be raised), which can then be caught by a debugger. Again, the details are complicated and can be found various places online or in Intel's technical manuals, but not necessary to gain a high-level understanding.
The nice thing about these special debug registers is that they provide a way to implement data breakpoints without needing to modify the code! However, there are two serious limitations. First, there are only four possible locations, so without a lot of cleverness, you are limited to four breakpoints. Second, the debug registers are privileged resources, and instructions that access and manipulate them can be executed only at ring 0 (essentially, kernel mode). Attempts to read or write these registers at any other privilege level (such as in ring 3, which is effectively user mode) will cause a general protection fault. Therefore, the Visual Studio debugger has to jump through some hoops to use these. I believe that it first suspends the thread and then calls the SetThreadContext
API function (which causes a switch to kernel mode internally) to manipulate the contents of the registers. Finally, it resumes the thread. These debug registers are very powerful for setting read/write breakpoints for memory locations that contain data, as well as for setting execute breakpoints for memory locations that contain code.
However, if you need more than 4, or hit against some other limitation, then these hardware-provided debug registers won't work. The Visual Studio debugger has to have some other, more general way of implementing data breakpoints. This is, in fact, why having a large number of breakpoints can really slow down the execution of your program when running under the debugger.
There are various tricks here, and I know a lot less about exactly which ones are used by the different closed-source debuggers. You could almost certainly find out by reverse-engineering or even closer observation, and perhaps there is someone that knows more about this than me. But I'll briefly summarize a couple of the tricks I know about: