How Computers Execute Code — CPU, Memory, and I/O
You’ve written a program, compiled it, and run it.
But what actually happens after you press Enter?
How does your source code turn into electric signals, memory reads, and CPU instructions?
Understanding this process, from code to execution, bridges the gap between software and hardware, and it’s one of the most empowering things you can learn as a developer.
Tips
C++ is one of the few languages that lets you see all the way down, from your high-level logic to the individual instructions your CPU executes.
Info
This article breaks down what happens between writing a .cpp file and seeing your program run on the screen. We’ll follow step by step, from compilation to linking, memory layout, and instruction execution.
1. From Source Code to Binary
When you build a C++ program, several distinct stages transform your human-readable source into a CPU-executable binary.
Each stage serves a purpose:
| Stage | Tool | Description |
|---|---|---|
| Preprocessing | cpp | Handles #include, #define, conditional macros, and expands headers into the code. |
| Compilation | g++, clang++ | Translates preprocessed C++ code into assembly instructions for a specific architecture. |
| Assembly | as | Converts human-readable assembly into binary machine code stored in object files. |
| Linking | ld | Combines all object files and libraries, resolving symbols into a complete executable binary. |
Each translation unit (.cpp file) is compiled separately into an object file. The linker later merges them all into one executable.
2. Preprocessing — Expanding the Code
The C++ preprocessor runs before the compiler sees any C++ syntax. It handles directives that start with #.
CPP(1) GNU CPP(1)
NAME
cpp - The C Preprocessor
SYNOPSIS
cpp [-Dmacro[=defn]...] [-Umacro]
[-Idir...] [-iquotedir...]
[-M|-MM] [-MG] [-MF filename]
[-MP] [-MQ target...]
[-MT target...]
infile [[-o] outfile]
Only the most useful options are given above; see below for a more complete list of
preprocessor-specific options. In addition, cpp accepts most gcc driver options,
which are not listed here. Refer to the GCC documentation for details.
DESCRIPTION
The C preprocessor, often known as cpp, is a macro processor that is used
automatically by the C compiler to transform your program before compilation. It is
called a macro processor because it allows you to define macros, which are brief
abbreviations for longer constructs.Common Preprocessor Tasks
- Include header files:
#include <iostream> - Replace macros:
#define PI 3.14159 - Conditional compilation:
#ifdef DEBUG
Example:
#include <iostream>
#define SQUARE(x) ((x) * (x))
int main()
{
#ifdef DEBUG
std::cout << "Debug mode!" << std::endl;
#endif
std::cout << SQUARE(3) << "\n";
}After preprocessing, this becomes a single expanded source file:
- without
DEBUGdefined, the message is omitted.
$ cpp -xc++ -P main.cpp > main.pp.cppint main()
{
std::cout << ((3) * (3)) << "\n";
}- with
DEBUGdefined, the debug message is included.
$ cpp -xc++ -P -DDEBUG main.cpp > main.pp.cppint main()
{
std::cout << "Debug mode!" << std::endl;
std::cout << ((3) * (3)) << "\n";
}Important
All macros are resolved.
Warning
This is why large projects with heavy templates (like the STL) take longer to compile — the preprocessor must physically copy thousands of lines from headers.
cpp Command Options
cpp supports many options. Here are a few used above:
| Option | Meaning |
|---|---|
-P | Suppress #line directives (cleaner output). |
-DNAME[=VALUE] | Define a macro NAME with optional VALUE. |
-xc++ | Treat input files as C++ source files. |
3. Compilation — Translating C++ to Assembly
The compiler turns C++ syntax into assembly code for your target CPU architecture.
For example:
int add(int a, int b) { return a + b; }might compile to:
_add:
mov eax, edi ; copy 'a' to eax
add eax, esi ; add 'b' to eax
ret ; return result in eaxCompilation Stages Internally
The compilation phase itself has several sub-steps:
- Lexical Analysis
Description
The compiler’s lexer (scanner) reads the raw source code character by character and groups sequences into tokens — atomic units like keywords, identifiers, literals, and operators.
Purpose
Convert a stream of characters into a stream of meaningful symbols.
Example
Expression:
int sum = a + b * 2;is broken into tokens:
[int] [sum] [=] [a] [+] [b] [*] [2] [;]Details
- Removes comments and redundant whitespace.
- Detects invalid tokens (e.g.,
@in C++). - Classifies tokens as identifiers, keywords, literals, operators, punctuators, etc.
- Each token is usually represented by a structure like
{ type: TOKEN_IDENTIFIER, value: "sum", line: 1, column: 5 }.
- Parsing — Syntax Analysis & AST Construction
Description
The parser consumes the tokens and checks whether they form a valid structure according to the language grammar (usually defined in BNF or EBNF form).
Purpose
Transform the linear token stream into a hierarchical Abstract Syntax Tree (AST) that represents the syntactic structure of the program.
Example
Tokens:
[int] [sum] [=] [a] [+] [b] [*] [2] [;]are represented in an AST structure (simplified) as:
Assignment
├── Type: int
├── Variable: sum
└── Expression (+)
├── Left: a
└── Right: (*)
├── Left: b
└── Right: 2Details
- Enforces grammar rules (e.g., expressions must be inside statements, statements inside blocks).
- Detects syntax errors like missing semicolons or mismatched parentheses.
- May perform error recovery to continue parsing after minor issues.
- Semantic Analysis — Meaning & Validation
Description
Once syntax is correct, the compiler verifies semantic correctness — the program “makes sense” according to language rules.
Purpose
Ensure types, declarations, and scopes are valid and consistent.
Checks performed:
| Check | What it does |
|---|---|
| Type checking | Validate expressions like int + string → invalid. |
| Scope resolution | Identify which variable/function a name refers to. |
| Declaration checks | Ensure symbols are declared before use. |
| Const correctness & access control | Detect const violations and private member access. |
| Template instantiation | Generate specialized template code. |
| Overload resolution | Pick the correct function overload. |
Example
int x = "hello"; // ❌ invalid: assigning string to int
foo(3.14); // ✅ finds foo(double) overload- Optimization — Code Simplification & Transformation
Description
The compiler improves performance and/or reduces code size without changing behavior.
Optimizations happen at multiple stages — AST-level, Intermediate Representation (IR), and machine code.
Common optimizations:
| Optimization | What it does |
|---|---|
| Constant folding | Evaluate constant expressions at compile time (e.g., 3 * 4 → 12). |
| Constant propagation | Replace variables with known constant values. |
| Dead code elimination | Remove unreachable or unused code. |
| Loop unrolling | Duplicate loop body to reduce iteration overhead. |
| Inlining | Replace small function calls with the function body. |
| Strength reduction | Replace expensive ops with cheaper ones (e.g., x * 2 → x << 1). |
Example
for (int i = 0; i < 4; ++i) { sum += i; }becomes:
int sum = 0;
sum += 0;
sum += 1;
sum += 2;
sum += 3;and then possibly folded into a constant result if sum is known.
- Code Generation — Intermediate Representation & Machine Mapping
Description
The code generator transforms the optimized AST or IR into target architecture instructions.
Purpose
Translate platform-independent logic into low-level operations.
Steps
- Lower high-level constructs into Intermediate Representation (IR), e.g., LLVM IR.
- Perform register allocation — decide which variables go into CPU registers vs memory.
- Select machine instructions matching the target architecture.
- Apply target-specific optimizations (instruction scheduling, vectorization).
Example (LLVM IR):
%1 = add i32 %a, %b
%2 = mul i32 %1, 2
store i32 %2, i32* %sumResulting x86 assembly (simplified):
mov eax, [a]
add eax, [b]
shl eax, 1
mov [sum], eax- Assembly Output — Final Translation
Description
The compiler writes out the final assembly file (.s) or directly produces object code (.o).
Purpose
Generate human-readable assembly for inspection or debugging.
Example:
.section .text
.globl _main
_main:
movl a(%rip), %eax
addl b(%rip), %eax
sall $1, %eax
movl %eax, sum(%rip)
retThen
- The assembler converts
.s→.o(binary machine code). - The linker combines
.ofiles into a single executable, resolving external symbols.
4. Linking — Combining Object Files
Large projects use multiple .cpp files that the linker must connect.
// math.cpp
int add(int a, int b) { return a + b; }
// main.cpp
#include <iostream>
extern int add(int, int);
int main() { std::cout << add(2, 3); }Linking Steps
Symbol Resolution — matches function declarations (
extern) with their definitions.Section Merging — merges
.text(code),.data(initialized globals), and.bss(zeroed data) sections.Relocation — adjusts addresses so cross-file references point to correct memory locations.
Static and Dynamic Linking —
- Static: libraries compiled into the binary (
libm.a). - Dynamic: shared at runtime (
libstdc++.so).
- Static: libraries compiled into the binary (
5. Program Memory Layout
When the OS loads your program, it maps sections into the process's virtual memory space.
| Segment | Purpose | Example |
|---|---|---|
| .text | Compiled machine instructions (read-only). | Compiled functions. |
| .data | Global/static variables with initial values. | int count = 5; |
| .bss | Uninitialized global/static variables. | static int counter; |
| Heap | Dynamic allocations using new or malloc. | int* ptr = new int(10); |
| Stack | Local variables and return addresses. | int local = 42; |
int global = 42; // .data
static int counter; // .bss
int main()
{
int local = 7; // stack
int* ptr = new int; // heap
}The stack grows downward (toward lower addresses), while the heap grows upward. Collisions between them indicate memory corruption (e.g., stack overflow).
6. The C++ Memory Model
C++ defines a precise, formalized memory model — the rules that govern when reads and writes become visible to other threads and what behaviors are allowed or forbidden. Understanding it helps you write correct concurrent code and reason about performance.
Core Concepts
- Object lifetime: when storage for an object is obtained and released.
- Storage duration: static, thread, automatic, or dynamic.
- Happens-before and sequenced-before: ordering guarantees within and across threads.
- Data race: two or more threads access the same memory location without synchronization. At least one access is a write → undefined behavior.
In a single-threaded program, operations are sequenced-before one another:
int x = 1; // (1)
x = x + 2; // (2) sequenced after (1)In multithreaded contexts, synchronization is required:
#include <atomic>
#include <iostream>
std::atomic<int> x{0};
void thread1() { x.store(5); }
void thread2() { std::cout << x.load() << '\n'; }Without std::atomic (or another synchronization mechanism), concurrent reads/writes to x would form a data race — undefined behavior that compilers may “optimize” into surprising results.
Storage Duration and Lifetime
- Static storage duration: exists for the entire program (e.g., globals,
staticvariables). - Thread storage duration: exists for the lifetime of a thread (e.g.,
thread_local). - Automatic storage duration: local variables; begin at block entry, end at block exit.
- Dynamic storage duration: obtained via
new/deleteormalloc/free.
thread_local int tls_counter = 0; // each thread has its own instance
static int global_count = 0; // shared across threads (synchronize accesses!)Atomics and Memory Ordering
Atomics provide both atomicity and ordering constraints. Common memory orders:
memory_order_relaxed— atomicity only; no ordering guarantees.memory_order_acquire— a load that prevents subsequent operations from moving before it.memory_order_release— a store that prevents prior operations from moving after it.memory_order_acq_rel— combine acquire and release on read-modify-write.memory_order_seq_cst— the strongest; forms a single global total order of atomic ops.
Acquire–release pairs create a cross-thread happens-before relation:
#include <atomic>
#include <string>
std::string data;
std::atomic<bool> ready{false};
void producer() {
data = "payload"; // 1: write data
ready.store(true, std::memory_order_release); // 2: publish
}
void consumer() {
while (!ready.load(std::memory_order_acquire)) { /* spin */ }
// Happens-before ensures "data" is visible here
use(data);
}Fences and Advanced Patterns
std::atomic_thread_fence(order) provides ordering without touching a specific atomic object. It’s useful in low-level lock-free algorithms and when interacting with device memory.
Performance Notes: Caches and False Sharing
Modern CPUs maintain cache coherence. Two threads updating adjacent fields that reside in the same cache line can cause false sharing, leading to cache ping-pong and slowdowns. Use padding or alignas to separate frequently written counters:
struct alignas(64) PaddedCounter { std::atomic<uint64_t> value{0}; };Tips
Keep shared writes rare. Prefer thread-local accumulation plus periodic aggregation.
7. How Instructions Execute
Every compiled C++ program runs as a stream of machine instructions executed by the CPU in the fetch–decode–execute cycle.
Example: int x = a + b;
- Fetch instruction
ADDfrom.textmemory. - Decode it: operands are registers holding
aandb. - Execute: ALU performs addition.
- Store result in
x. - Move to next instruction via Program Counter.
In reality, the CPU uses pipelines, out-of-order execution, and branch prediction to keep multiple instructions flowing simultaneously.
Microarchitecture Essentials
- Pipelines and superscalar width: multiple instructions can be in flight and issued per cycle (instructions per cycle, IPC).
- Out-of-order (OoO) execution: instructions are reordered internally to hide latencies while preserving architectural correctness.
- Branch prediction: predicts control flow. A misprediction flushes the pipeline and costs cycles (the “mispredict penalty”).
- Caches and memory hierarchy: L1/L2/L3 caches and the TLB reduce average memory latency. Misses trigger longer access paths.
Latency vs Throughput
- Latency: time to complete one instruction (e.g., a load that hits L1 may be ~4 cycles; an L3 miss can be hundreds of cycles).
- Throughput: steady-state rate (e.g., one add per cycle per port). Compilers schedule instructions to maximize throughput.
SIMD and Vectorization
Modern CPUs expose SIMD (SSE/AVX/AVX-512). Compilers auto-vectorize simple loops. Libraries like <execution> can help:
#include <algorithm>
#include <execution>
#include <vector>
void saxpy(std::vector<float>& y, const std::vector<float>& x, float a)
{
std::transform(std::execution::par_unseq, x.begin(), x.end(), y.begin(), y.begin(),
[a](float xi, float yi) { return a * xi + yi; });
}With optimization enabled, this often lowers to packed vector instructions.
Memory Access Patterns
Tips
- Prefer contiguous, sequential access (cache line–friendly).
- Avoid random access in hot loops; consider structure-of-arrays (SoA) layouts for better vectorization.
- Align frequently accessed data when beneficial.
8. From Binary to Running Process
When you run a program, a complex chain of events occurs:
- Process Creation — the OS allocates memory space.
- Loader Phase — maps executable and shared libraries into memory.
- Runtime Setup — initializes heap, global constructors, thread-local storage.
- Program Entry — calls the
_startsymbol, which leads tomain().
C++ Runtime Initialization
The runtime system:
- Runs all global and static constructors before
main(). - Initializes I/O subsystems.
- Prepares
std::thread,std::mutex, and other runtime components.
After main() finishes, destructors for global/static objects are executed in reverse order of construction.
ELF, the Dynamic Linker, and Relocations (Linux)
- ELF format: executables and shared libraries contain sections (e.g.,
.text,.data) and segments the loader maps withmmap. - Dynamic linker (
ld-linux) resolves imports at load time. The PLT/GOT indirection supports dynamic symbol binding and lazy resolution. - PIE and ASLR: Position-Independent Executables enable Address Space Layout Randomization for security.
- Relocations: addresses in code/data are fixed up so references point to the correct runtime locations.
CRT Startup and main
On Linux, _start (from the C runtime objects like crt1.o) sets up the process, initializes the runtime, and calls __libc_start_main, which eventually calls your main(int argc, char** argv, char** envp).
Thread-Local Storage (TLS)
TLS variables live in special segments (e.g., .tdata, .tbss) and are managed per thread by the runtime/loader.
Security Hardening
- NX/DEP: non-executable stacks/heaps prevent code execution in data segments.
- Stack canaries: detect stack smashing.
- RELRO: read-only relocation sections after startup to prevent tampering.
Signals, Exit, and Cleanup
The OS can deliver signals (e.g., SIGSEGV, SIGINT). Exit paths run atexit handlers, flush I/O, and invoke static destructors.
9. C++ Execution Pipeline Summary
From writing code to CPU cycles:
Errors at each stage have distinct characteristics:
- Compilation errors: syntax, type mismatches.
- Linker errors: missing symbols or duplicate definitions.
- Runtime errors: segmentation faults, memory corruption.
Diagnose by Stage
- Compilation: enable warnings (
-Wall -Wextra -Wpedantic) and treat as errors (-Werror) during CI. - Linking: list symbols with
nmand inspect ELF withreadelf -a. - Loading: check dependencies with
lddand run withLD_DEBUG=libsto trace library resolution. - Runtime: attach
gdb, record syscalls withstrace, profile withperf, and analyze memory with Valgrind or sanitizers.
10. Why It Matters for a C++ Developer
C++ gives you unmatched control over how your program interacts with hardware. Understanding this pipeline lets you:
- Write high-performance, cache-aware code.
- Debug build and runtime issues methodically.
- Reduce binary size and startup time.
- Predict and fix memory and threading problems.
- Communicate effectively with compiler and system engineers.
Important
When you know what happens under the hood, you gain both power and precision — turning abstract syntax into predictable machine behavior.
Practical Habits
- Prefer clear ownership: RAII,
unique_ptrby default,shared_ptronly when needed. - Make concurrency explicit: favor message passing or well-defined atomic protocols, document memory orders.
- Measure, don’t assume: use
perf,time, and compiler reports (e.g.,-fopt-infoon GCC) to verify optimization. - Keep hot data compact and contiguous, minimize sharing and contention.
- Enable hardening and diagnostics in dev builds: sanitizers,
-D_GLIBCXX_ASSERTIONS.
Handy Toolbelt
- Inspect codegen:
objdump -d,llvm-objdump -d, Compiler Explorer (offline or online) to compare flags. - Binary and symbol introspection:
readelf,nm. - Dependency and loading:
ldd,LD_DEBUG. - Behavior at runtime:
strace,ltrace,perf, Valgrind.
These practices bridge the gap between intent and execution, making your C++ both faster and more reliable.