CPU Execution Fundamentals
The CPU's core function is straightforward: load instructions and execute them. The operating system places the next instruction address into the instruction pointer register, and the CPU fetches, addresses, and executes. Notably, the CPU has no awareness of processes, threads, or coroutines.
Process vs Thread: Shared Foundations
Both processes and threads rely on the same underlying data structure for management: the PCB (Process Control Block). The Linux implementation looks like this:
struct task_descriptor {
volatile long execution_state;
void *stack_pointer;
struct thread_context *ctx;
atomic_t ref_counter;
unsigned int attributes;
struct pid_namespace *pid_entry;
struct task_descriptor *ancestor;
struct list_head descendants;
struct memory_descriptor *vma;
struct file_table *fd_table;
// additional fields...
};
Some fields within the PCB are thread-specific, such as stack pointers and execution state. Others are process-level resources, including virtual address spaces and open file descriptors. The thread_context structure stores scheduling information:
struct thread_context {
struct cpu_registers *reg_state;
unsigned long state_flags;
unsigned long current_status;
segment_limit_t mem_boundary;
};
The critical component here is reg_state, which contains the stack pointer (SP) and the code segment plus instruction pointer (CS+IP) pointing to the next instruction. To the operating system, both processes and threads are simply tasks with states, stack pointers, and register values. Resources like virtual memory tables and file descriptors remain process-owned while being shared among sibling threads.
Key difference #1: Processes and threads differ in resource ownership. Memory allocations belong to processes, and file descriptors are attached to process-level structures.
Scheduling Mechanism
Process and thread context switches are orchestrated by the scheduler through CPU time slice allocation to individual PCBs. Since both are represented as PCBs, the scheduler treats them identically. Modern kernels rely on timer interrupts to trigger scheduling decisions.
When a timer interrupt fires, the scheduler selects the next runnable task from the PCB queue and restores its register and stack state. When the CPU resumes execution, it continues with the newly selected task.
Key difference #2: When consecutive scheduling decisions involve the same process, only registers and stacks require switching. When crossing process boundaries, the virtual memory mapping must also be updated.
Coroutine Architecture
In conventional operating systems, threads represent the fundamental scheduling unit. The context switches described above are initiated by interrupts and require entering kernel mode with full context save/restore operations—this overhead is why thread switching carries performance costs.
Coroutines take a different approach: switching occurs entirely within the application, independent of the operating system. The mechanism involves a controller simulating timer interrupts to periodically yield control:
┌─────────────────────────────────────────┐
│ User Space │
│ ┌─────────┐ ┌─────────┐ ┌─────────┐ │
│ │Routine A│ │Routine B│ │Routine C│ │
│ └────┬────┘ └────┬────┘ └────┬────┘ │
│ │ │ │ │
│ └────────────┼────────────┘ │
│ │ │
│ ┌───────┴───────┐ │
│ │ Scheduler │ │
│ │ (in-process) │ │
│ └───────────────┘ │
└─────────────────────────────────────────┘
On Linux systems, this is typically implemented using signals. When a process starts, a control thread runs at a fixed interval (e.g., 1ms) sending SIGALRM. Upon receiving the signal, the process triggers its internal coroutine scheduler.
Coroutine Trade-offs
Advantages:
- Fast context switching: Coroutine transitions occur in user space without kernel entry, dramatically reducing switch overhead. This enables high-frequency task switching with minimal latency.
- Simpler concurrency model: Unlike the hierarchical process-thread relationship, coroutines and threads lack rigid parent-child bindings. A thread pool can schedule arbitrary coroutines, reducing architectural complexity.
Disadvantages:
- Blocking sensitivity: The kernel remains unaware of coroutines. If a coroutine performs blocking operations (such as waiting for I/O), the kernel may suspend the entire hosting thread, blocking all other coroutines sharing that thread. Consequently, coroutines typically rely on non-blocking I/O patterns.
- Debugging complexity: Since scheduling occurs in user space, traditional debugging and monitoring tools struggle to track coroutine behavior accurately, complicating troubleshooting.
Go's Concurrency Model
Go provides goroutines and channels as elegant concurrency primitives. Instead of explicit locking for shared data access, Go encourages passing data references between goroutines through channels. This design guarantees that only one goroutine holds access to a given piece of data at any moment.
Channel Buffering Semantics
Unbuffered channels:
syncCh := make(chan struct{})
Unbuffered channels have zero capacity, requiring both sender and receiver to be ready simultaneously before exchange occurs. When a goroutine attempts to send to a blocking channel without a waiting receiver, the sender suspends until a receiver arrives. Similarly, receiving from an empty unbuffered channel suspends until data is sent.
Unbuffered channels enforce synchronous handoffs.
Buffered channels:
bufferedCh := make(chan struct{}, 1024)
Buffered channels accept a capacity parameter. When sending to a full buffer, the goroutine blocks until space becomes available. If capacity exists, the send completes immediately. When receiving from an empty buffer, the goroutine blocks until data arrives.
Buffered channels introduce asynchrony—senders and receivers can operate at different rates up to the buffer limit.