Python Threading: Concurrency, GIL, and Synchronization

Background Knowledge

Processes

In operating systems, a program cannot run independently. Only when a program is loaded into memory and the system allocates resources to it can it execute. This executing program is called a process. The difference between a program and a process is that a program is a static collection of instructions, while a process is the execution of that program. In multiprogramming systems, multiple programs can be loaded into memory simultaneously, and under the operating system's scheduling, they can execute concurrently. This design significantly improves CPU utilization.

Why Threads When We Have Processes?

Processes have advantages, providing multiprogramming that makes each user feel they have their own CPU and other resources, improving computer utilization. However, processes have drawbacks:

A process can only do one thing at a time. If it needs to do multiple things simultaneously, processes are insufficient.
If a process blocks during execution (e.g., waiting for input), the entire process hangs, even if some work doesn't depend on the input data.

For example, if we consider attending a class as a process, we need to listen to the teacher, take notes, and think—all at the same time for efficiency. With only processes, these three tasks couldn't run simultaneously. If the teacher gets stuck solving a problem on the board, we can't do anything else, even if we want to think about a previous question we didn't understand.

The Emergence of Threads

In the 1960s, processes were the only entities that could own resources and run independently in operating systems. As computer technology evolved, processes had drawbacks: creating, destroying, and switching processes had significant overhead. This led to the introduction of "lightweight processes." In the 1980s, threads emerged as entities that could run independently.

Key points:

A process is the smallest unit of resource allocation.
A thread is the smallest unit of CPU scheduling.
Each process has at least one thread.

Think of a process as a factory that integrates resources, while threads are the assembly lines within that factory. The factory (process) manages resources, while the assembly lines (threads) perform the actual work using the CPU (power).

Relationship Between Processes and Threads

The differences between threads and processes can be summarized as follows:

Address space and resources: Processes are independent of each other, while threads within the same process share resources. Threads in one process are not visible to other processes.
Communication: Processes communicate through IPC (Inter-Process Communication), while threads within a process can directly read and write process data segments (like global variables) for communication—requiring synchronization and mutual exclusion mechanisms to ensure data consistency.
Scheduling and switching: Thread context switching is much faster than process context switching.
In multi-threaded operating systems: A process is not an executable entity; threads are what actually execute programs. You can think of a process as a container for threads.

Characteristics of Threads

In multi-threaded operating systems, a process typically contains multiple threads. Each thread is the basic unit for utilizing the CPU with minimal overhead. Threads have the following attributes:

Lightweight entity: Threads basically don't own system resources, only some essential ones to run independently. A thread's entity includes the program, data, and TCB (Thread Control Block).
Basic unit for independent scheduling and dispatch: In multi-threaded OS, threads are the basic units that can run independently, and thus are also independent units for scheduling and dispatch. Thread switching is fast and has low overhead.
Sharing process resources: Threads within the same process can share all resources of that process, including memory, open files, timers, etc.
Concurrent execution: Multiple threads within a process can execute concurrently, as can threads in different processes, fully utilizing the parallel capabilities of the processor and peripheral devices.

Practical Application Scenarios for Threads

Consider a word processing application process. This process needs to handle multiple tasks: listening to keyboard input, processing text, and periodically auto-saving text to disk. These operations all work on the same data, so we can't use multiple processes. Instead, we must concurrently start three threads within one process.

In single-threaded scenarios, when keyboard input is being processed, text processing and auto-saving cannot occur simultaneously. Similarly, when auto-saving is happening, input and text processing are blocked.

Another example is handling multiple client connections in a server. If 500 clients connect simultaneously, creating 500 processes is impractical. Instead, we can create a few processes, each with multiple threads to handle multiple requests and communications. Similarly, when using an application like QQ, the main process handles multiple concurrent message streams from different contacts.

Threads in Memory

Multiple threads within the same process share the resources in the process's address space, simulating multiple processes on a single computer—hence the term "lightweight processes" for threads.

Similar to processes, each thread has its own stack. Unlike processes, thread libraries cannot use clock interrupts to force threads to yield the CPU. Threads can voluntarily yield the CPU using thread_yield.

Threads are generally beneficial but increase programming complexity. Thread-related issues include:

If a parent process has multiple threads, does a child process need the same number of threads?
Within the same process, if one thread closes a file while another is writing to it, what happens?

Therefore, multi-threaded code requires more careful design of program logic and data protection.

User-Level Threads vs. Kernel-Level Threads

Thread implementations can be categorized into two types: user-level threads and kernel-level threads. The latter is also called kernel-supported threads or lightweight processes.

User-Level Threads

In user-level threads, kernel switching is controlled by the user-space program itself without kernel intervention, reducing the overhead of entering and exiting kernel mode. However, they cannot effectively utilize multi-core CPUs.

User-level threads simulate OS process scheduling in user space. Each process has a runtime system to schedule threads. When a process gets CPU time, it schedules a thread to execute. Only one thread executes at a time.

Kernel-Level Threads

In kernel-level threads, switching is controlled by the kernel. When a thread switches, it transitions from user mode to kernel mode and back. They can effectively utilize SMP (Symmetric MultiProcessing) and multi-core CPUs. Windows threads are an example of this.

Comparison

Feature	User-Level Threads	Kernel-Level Threads
Kernel awareness	OS kernel is unaware	OS kernel is aware
Creation/destruction/scheduling	Handled at language level (no OS kernel support)	Requires OS kernel support
System call impact	Entire process interrupted	Only the calling thread interrupted
CPU scheduling unit	Process	Thread
Execution state	User mode only	Any state

Hybrid Implementation

This approach multiplexes user-level threads on top of kernel-level threads. The kernel schedules kernel threads, each of which corresponds to multiple user threads. Both user and kernel are aware of these threads. When a user creates a thread, the OS kernel creates a corresponding thread to execute it.

Linux NPTL (Native POSIX Threads Library)

Before kernel 2.6, Linux's scheduling entity was the process, with no true thread support. LinuxThreads used the clone() system call to create process copies that shared the calling process's address space. However, this approach had many non-POSIX compliant aspects.

To improve this, NPTL was developed. Like LinuxThreads, NPTL uses the clone() system call but requires special kernel support, such as futex for thread synchronization primitives. NPTL follows a 1:1 model where each user thread corresponds to a kernel thread (process), simplifying thread implementation.

Python and Threads

Global Interpreter Lock (GIL)

Python code execution is controlled by the Python virtual machine (also called the interpreter main loop). Python was designed so that only one thread can execute in the main loop at a time. Although multiple threads can "run" in the Python interpreter, only one thread can execute in the interpreter at any given moment.

Access to the Python virtual machine is controlled by the GIL, which ensures only one thread runs at a time. In a multi-threaded environment, the Python virtual machine executes as follows:

Set the GIL.
Switch to a thread to run.
Run a specified number of bytecode instructions or until the thread voluntarily yields control (e.g., by calling time.sleep(0)).
Put the thread to sleep.
Release the GIL.
Repeat all the above steps.

When calling external code (like C/C++ extension functions), the GIL is locked until the function completes. Extension programmers can explicitly unlock the GIL.

Python Thread Module Selection

Python provides several modules for multi-threaded programming: thread, threading, and Queue. The thread module provides basic thread and lock support, while threading offers more advanced and comprehensive thread management functionality. The Queue module allows creating a queue data structure for sharing data between multiple threads.

Avoid using the thread module because:

The higher-level threading module is more advanced and has better thread support.
Thread attributes may conflict with threading.
The thread module has limited synchronization primitives (only one).
When the main thread ends, all threads are forcibly terminated without warning or proper cleanup. The threading module ensures important child threads exit before the process terminates.

Similar to how datetime is built on top of time for easier date handling, threading is built on thread for more comprehensive thread management.

Threading Module

The threading module is similar in interface to the multiprocess module, which mimics it. We'll focus on threading:

Thread Creation

There are two ways to create threads in Python:

Method 1: Using Thread with target function

from threading import Thread
import time

def process_task(name):
    time.sleep(2)
    print(f'{name} says hello')

if __name__ == '__main__':
    t = Thread(target=process_task, args=('Processor',))
    t.start()
    print('Main thread')

Method 2: Subclassing Thread

import time
from threading import Thread

class WorkerThread(Thread):
    def __init__(self, name):
        super().__init__()
        self.name = name
    
    def run(self):
        time.sleep(2)
        print(f'{self.name} says hello')

if __name__ == '__main__':
    t = WorkerThread('Worker')
    t.start()
    print('Main thread')

Multi-threading vs Multi-processing

from threading import Thread
from multiprocessing import Process
import os

def task():
    print('Hello', os.getpid())

if __name__ == '__main__':
    # Part 1: Starting multiple threads in the main process
    t1 = Thread(target=task)
    t2 = Thread(target=task)
    t1.start()
    t2.start()
    print('Main thread/process PID:', os.getpid())

    # Part 2: Starting multiple processes
    p1 = Process(target=task)
    p2 = Process(target=task)
    p1.start()
    p2.start()
    print('Main thread/process PID:', os.getpid())

What exists in processes vs. threads?

Processes: imported modules, location of executed Python files, built-in functions, code in files, global variables, etc.
Threads: their own stack (similar to a list, LIFO) and registers containing thread-specific variables and operations, occupying minimal space.

Performance Comparison

from threading import Thread
from multiprocessing import Process
import os
import time

def task():
    print('Hello')

if __name__ == '__main__':
    # Thread performance
    start_time = time.time()
    t = Thread(target=task)
    t.start()
    t.join()
    thread_time = time.time() - start_time
    print('Thread execution time:', thread_time)
    print('Main thread/process')

    # Process performance
    start_time = time.time()
    p = Process(target=task)
    p.start()
    p.join()
    process_time = time.time() - start_time
    print('Process execution time:', process_time)
    print('Main thread/process')

Memory Data Sharing

from threading import Thread
from multiprocessing import Process
import os
import time

def task():
    global n
    n = 0

if __name__ == '__main__':
    # Process version
    n = 100
    p = Process(target=task)
    p.start()
    p.join()
    print('Parent process n:', n)  # Still 100, child process has its own copy

    # Thread version
    n = 1
    t = Thread(target=task)
    t.start()
    t.join()
    print('Parent thread n:', n)  # 0, threads share memory within the same process

Multi-threading Socket Implementation

Socket programming often involves significant I/O operations (recv, accept, etc.), making multi-threading more efficient due to lower overhead:

Server

import socket
import threading

server_socket = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
server_socket.bind(('127.0.0.1', 8080))
server_socket.listen(5)

def handle_client(client_socket):
    while True:
        data = client_socket.recv(1024)
        print(data)
        msg = input('Server input: ')
        client_socket.send(bytes(msg, encoding='utf-8'))

if __name__ == '__main__':
    while True:
        client_socket, address = server_socket.accept()
        thread = threading.Thread(target=handle_client, args=(client_socket,))
        thread.start()

Client

import socket

client_socket = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
client_socket.connect(('127.0.0.1', 8080))

while True:
    msg = input('>>: ').strip()
    if not msg:
        continue
    client_socket.send(msg.encode('utf-8'))
    data = client_socket.recv(1024)
    print(data)

Thread Class Methods

isAlive(): Returns whether the thread is active.
getName(): Returns the thread name.
setName(): Sets the thread name.

Methods provided by the threading module:

threading.currentThread(): Returns the current thread variable.
threading.enumerate(): Returns a list of currently running threads.
threading.activeCount(): Returns the number of currently running threads.

Join Method

from threading import Thread
import time

def worker(name):
    time.sleep(2)
    print(f'{name} says hello')

if __name__ == '__main__':
    t1 = Thread(target=worker, args=('Worker1',))
    t2 = Thread(target=worker, args=('Worker2',))
    t1.start()
    t2.start()
    t1.join()  # Main thread waits for t1 to complete
    
    print('Main thread')
    print(t1.is_alive())  # False, as t1 has completed
    print(t2.is_alive())  # Could be True or False depending on timing

Daemon Threads

Both processes and threads follow: daemon threads are destroyed after the main thread/process completes. It's important to note that "completes" means finishing execution, not terminating.

For a process, completion means the process code has finished executing.
For a thread, completion means all non-daemon threads in the process have finished executing.

from threading import Thread
import time

def worker(name):
    time.sleep(2)
    print(f'{name} says hello')

if __name__ == '__main__':
    t = Thread(target=worker, args=('Daemon',))
    t.setDaemon(True)  # Must be set before t.start()
    t.start()
    
    print('Main thread')
    print(t.is_alive())

Locks

Global Interpreter Lock (GIL)

Some languages (Java, C++, C) support multiple threads within a single process utilizing multiple CPU cores. In multi-process scenarios with shared data, we use locks to prevent data safety issues. Similarly, multi-threading faces the same problem.

Python historically added a GIL (Global Interpreter Lock) at the interpreter level, which locks the entire thread rather than specific data operations. Only one thread can use the CPU at a time, meaning multi-threading cannot utilize multiple cores. This is a characteristic of the CPython interpreter, not the Python language itself. Other implementations like Jython don't have this issue.

However, with the GIL, can we still achieve concurrency? For CPU-intensive programs, it's problematic. But for I/O-bound programs (which most applications are), multi-threading can still be effective. The CPU rapidly schedules threads, and when threads perform I/O operations (network requests, file operations), other threads can continue execution.

Synchronization Locks

Key points about synchronization locks:

Threads compete for the GIL (execution right). After acquiring the GIL, a thread can acquire a mutual exclusion lock (Lock). Other threads can also acquire the GIL, but if they find the Lock hasn't been released, they block, even if they have execution rights.
join() waits for all threads to complete (overall serialization), while a lock only serializes the part modifying shared data (partial serialization). Both can ensure data safety, but partial serialization with locks is more efficient.
The GIL protects interpreter-level data (like garbage collection), while user-defined locks protect application data. They protect different resources.

Why We Need Locks Despite the GIL

The GIL protects interpreter-level data, but user data requires separate locks. Consider this example:

from threading import Thread, Lock
import os, time

def worker():
    global n
    temp = n
    time.sleep(0.1)  # Simulate I/O operation
    n = temp - 1

if __name__ == '__main__':
    n = 100
    threads = []
    for i in range(100):
        t = Thread(target=worker)
        threads.append(t)
        t.start()
    
    for t in threads:
        t.join()
    
    print(n)  # Result may not be 0 due to race conditions

With the GIL, only one thread executes Python bytecode at a time. However, during I/O operations (like time.sleep()), the GIL is released, allowing other threads to run. If multiple threads read the shared variable n before any write it back, we get incorrect results.

Using Locks for Data Safety

from threading import Thread, Lock
import os, time

def worker():
    global n
    lock.acquire()  # Acquire lock before accessing shared data
    temp = n
    time.sleep(0.1)
    n = temp - 1
    lock.release()  # Release lock after modification

if __name__ == '__main__':
    lock = Lock()
    n = 100
    threads = []
    for i in range(100):
        t = Thread(target=worker)
        threads.append(t)
        t.start()
    
    for t in threads:
        t.join()
    
    print(n)  # Result will be 0

Lock vs Join

Both locks and join() can serialize execution to ensure data safety, but they have different performance characteristics:

join(): Serializes all code within a thread.
Locks: Only serialize the critical section (code accessing shared data).

Locks are more efficient as they only serialize the necessary parts:

# Without lock: concurrent execution, fast, but data unsafe
from threading import Thread, Lock
import os, time

def task():
    print(f'{current_thread().name} is running')
    global n
    temp = n
    time.sleep(0.5)
    n = temp - 1

if __name__ == '__main__':
    n = 100
    threads = []
    start_time = time.time()
    
    for i in range(100):
        t = Thread(target=task)
        threads.append(t)
        t.start()
    
    for t in threads:
        t.join()
    
    print(f'Main: {time.time() - start_time} n:{n}')
    # Result: n is likely not 0

# With lock: non-critical section concurrent, critical section serial, slower but data safe
def task():
    time.sleep(3)
    print(f'{current_thread().name} start to run')
    global n
    lock.acquire()
    temp = n
    time.sleep(0.5)
    n = temp - 1
    lock.release()

if __name__ == '__main__':
    n = 100
    lock = Lock()
    threads = []
    start_time = time.time()
    
    for i in range(100):
        t = Thread(target=task)
        threads.append(t)
        t.start()
    
    for t in threads:
        t.join()
    
    print(f'Main: {time.time() - start_time} n:{n}')
    # Result: n is 0, but execution time is much longer

# Using join immediately after start: serializes all code, very inefficient
def task():
    time.sleep(3)
    print(f'{current_thread().name} start to run')
    global n
    temp = n
    time.sleep(0.5)
    n = temp - 1

if __name__ == '__main__':
    n = 100
    start_time = time.time()
    
    for i in range(100):
        t = Thread(target=task)
        t.start()
        t.join()  # Wait for each thread to complete before starting next
    
    print(f'Main: {time.time() - start_time} n:{n}')
    # Result: n is 0, but execution time is extremely long

Deadlocks and Recursive Locks

A deadlock occurs when two or more processes/threads wait for each other to release resources, causing all to block indefinitely. Here's an example:

from threading import Lock
import time

mutexA = Lock()
mutexB = Lock()

class DeadlockThread(Thread):
    def run(self):
        self.func1()
        self.func2()
    
    def func1(self):
        mutexA.acquire()
        print(f'{self.name} acquired lock A')
        mutexB.acquire()
        print(f'{self.name} acquired lock B')
        mutexB.release()
        mutexA.release()
    
    def func2(self):
        mutexB.acquire()
        print(f'{self.name} acquired lock B')
        time.sleep(2)
        mutexA.acquire()
        print(f'{self.name} acquired lock A')
        mutexA.release()
        mutexB.release()

if __name__ == '__main__':
    for i in range(2):
        t = DeadlockThread()
        t.start()

This code creates a deadlock because:

Thread 1 acquires lock A in func1.
Thread 2 acquires lock B in func1.
Thread 1 tries to acquire lock B in func2 but blocks (held by Thread 2).
Thread 2 tries to acquire lock A in func2 but blocks (held by Thread 1).

Solving Deadlocks with Recursive Locks

In Python, RLock (Reentrant Lock) allows a thread to acquire the same lock multiple times. This solves deadlocks where a thread needs to reacquire a lock it already holds:

from threading import RLock
import time

mutexA = mutexB = RLock()  # Use the same RLock for both

class SafeThread(Thread):
    def run(self):
        self.func1()
        self.func2()
    
    def func1(self):
        mutexA.acquire()
        print(f'{self.name} acquired lock A')
        mutexB.acquire()
        print(f'{self.name} acquired lock B')
        mutexB.release()
        mutexA.release()
    
    def func2(self):
        mutexB.acquire()
        print(f'{self.name} acquired lock B')
        time.sleep(2)
        mutexA.acquire()
        print(f'{self.name} acquired lock A')
        mutexA.release()
        mutexB.release()

if __name__ == '__main__':
    for i in range(2):
        t = SafeThread()
        t.start()

The Dining Philosophers Problem

This classic problem demonstrates deadlock and its solution with recursive locks:

import time
from threading import Thread, RLock

# Using separate locks causes deadlock
fork_lock = Lock()
noodle_lock = Lock()

def eat_with_fist(name):
    noodle_lock.acquire()
    print(f'{name} acquired noodles')
    fork_lock.acquire()
    print(f'{name} acquired fork')
    print(f'{name} is eating')
    fork_lock.release()
    noodle_lock.release()

def eat_with_hand(name):
    fork_lock.acquire()
    print(f'{name} acquired fork')
    time.sleep(1)
    noodle_lock.acquire()
    print(f'{name} acquired noodles')
    print(f'{name} is eating')
    noodle_lock.release()
    fork_lock.release()

# Solution with recursive lock
fork_lock = noodle_lock = RLock()

def eat_safe(name):
    noodle_lock.acquire()
    print(f'{name} acquired noodles')
    fork_lock.acquire()
    print(f'{name} acquired fork')
    print(f'{name} is eating')
    fork_lock.release()
    noodle_lock.release()

if __name__ == '__main__':
    # This causes deadlock
    names = ['Alice', 'Bob', 'Charlie']
    for name in names:
        t1 = Thread(target=eat_with_fist, args=(name,))
        t2 = Thread(target=eat_with_hand, args=(name,))
        t1.start()
        t2.start()
    
    # This works with recursive locks
    names = ['Dave', 'Eve', 'Frank']
    for name in names:
        t = Thread(target=eat_safe, args=(name,))
        t.start()

Tags: python threading Concurrency GIL Synchronization

Posted on Mon, 18 May 2026 00:26:51 +0000 by ThEMakeR

Freaks City

Python Threading: Concurrency, GIL, and Synchronization

Background Knowledge

Processes

Why Threads When We Have Processes?

The Emergence of Threads

Relationship Between Processes and Threads

Characteristics of Threads

Practical Application Scenarios for Threads

Threads in Memory

User-Level Threads vs. Kernel-Level Threads

User-Level Threads

Kernel-Level Threads

Comparison

Hybrid Implementation

Linux NPTL (Native POSIX Threads Library)

Python and Threads

Global Interpreter Lock (GIL)

Python Thread Module Selection

Threading Module

Thread Creation

Method 1: Using Thread with target function

Method 2: Subclassing Thread

Multi-threading vs Multi-processing

Performance Comparison

Memory Data Sharing

Multi-threading Socket Implementation

Server

Client

Thread Class Methods

Join Method

Daemon Threads

Locks

Global Interpreter Lock (GIL)

Synchronization Locks

Why We Need Locks Despite the GIL

Using Locks for Data Safety

Lock vs Join

Deadlocks and Recursive Locks

Solving Deadlocks with Recursive Locks

The Dining Philosophers Problem

Hot Tags