Multithreading in Python: threading Module Complete Guide
Python multithreading with the threading module: create threads, use Lock, RLock, and Semaphore, understand the GIL, and pick the right concurrency tool.
Python’s threading module runs multiple threads inside one process, making it the right tool for I/O-bound tasks like reading files, calling APIs, or waiting on database queries.
What Is a Thread?
A thread is the smallest unit of execution within a process. One Python process can hold many threads, all sharing the same memory space. That shared space is both the benefit (passing data between threads needs no serialisation) and the risk (two threads modifying the same variable simultaneously produces unpredictable results).
The standard analogy: a chef working alone is a single-threaded process. Two chefs in the same kitchen sharing the same knives and cutting boards are two threads. Communication is cheap, but coordination matters.
Before going further, if you are building up Python basics: Python basic programs and examples covers the single-threaded foundations this article builds on.
Creating Threads with the threading Module
The threading module ships in Python’s standard library. Two patterns exist: passing a target function, or subclassing Thread.
Target function
import threading
def fetch_data(url):
print(f"Fetching {url}")
t = threading.Thread(target=fetch_data, args=("https://example.com",))
t.start() # schedule the thread for execution
t.join() # block the caller until this thread finishes
Subclassing Thread
import threading
class FetchThread(threading.Thread):
def __init__(self, url):
super().__init__()
self.url = url
def run(self): # override run(), not start()
print(f"Fetching {self.url}")
t = FetchThread("https://example.com")
t.start()
t.join()
Override run(), not start(). Overriding start() bypasses the internal thread-launch machinery. Both patterns call start() to launch and join() to wait for completion. The target function approach is concise for simple tasks; subclassing is cleaner when the thread carries internal state.
Thread Lifecycle: start(), join(), and Daemon Threads
Every thread passes through these states:
| State | What triggers it |
|---|---|
| New | Thread object created |
| Runnable | start() called |
| Running | OS scheduler picks the thread |
| Blocked | Waiting for a lock, I/O, or time.sleep() |
| Dead | run() returns |
Daemon threads are the exception to the “wait for all threads” rule. Mark a thread as daemon before calling start():
t = threading.Thread(target=background_logger, daemon=True)
t.start()
# When the main thread exits, this thread is killed automatically
Non-daemon threads (the default) keep the process alive until they finish. Daemon threads are killed the moment the last non-daemon thread exits, whether or not they are done. Use daemon threads for background housekeeping tasks (log flushers, cache warmers, heartbeat pings) that should not block shutdown.
join(timeout=N) adds a maximum wait time. If the thread has not finished within N seconds, join() returns anyway, but the thread keeps running.
Synchronisation Primitives: Lock, RLock, Semaphore, and Condition
Without coordination, two threads incrementing the same counter race each other. The read-increment-write sequence is not atomic, so results become unpredictable. Python’s threading module ships four primitives to handle this.
Lock
The fundamental primitive. One thread holds the lock at a time.
import threading
counter = 0
lock = threading.Lock()
def increment():
global counter
with lock: # acquire on entry, release on exit
counter += 1 # only one thread executes this line at a time
The with statement is the idiomatic form and handles release even if an exception is raised. Using lock.acquire() and lock.release() directly works but risks a deadlock if the release is skipped in an exception path.
RLock (Re-entrant Lock)
A regular Lock deadlocks if the same thread tries to acquire it twice before releasing. RLock tracks ownership and allows the same thread to re-acquire without blocking:
rlock = threading.RLock()
def recursive_task(n):
with rlock:
if n > 0:
recursive_task(n - 1) # same thread re-acquires cleanly
Use RLock when a function holding a lock calls another function that also needs the same lock.
Semaphore
A Semaphore holds an internal counter. Acquiring it decrements the counter; releasing increments it. When the counter reaches zero, acquisition blocks until another thread releases. This caps concurrent access to a resource:
sem = threading.Semaphore(3) # at most 3 threads inside at once
def limited_task():
with sem:
connect_to_database()
Useful for connection pools, download rate limiters, or any resource with a fixed capacity.
Condition
A Condition pairs a lock with a wait/notify signal. The standard pattern is producer-consumer:
condition = threading.Condition()
items = []
def producer():
with condition:
items.append("data")
condition.notify() # wake one waiting consumer
def consumer():
with condition:
condition.wait() # block until notified
item = items.pop()
condition.notify_all() wakes every waiting thread. condition.wait(timeout=5) adds a time limit on the wait.
The GIL: CPU-bound vs I/O-bound Code
The Python threading module documentation states the constraint every Python developer needs to understand: CPython’s Global Interpreter Lock (GIL) ensures only one thread executes Python bytecode at a time.
The GIL exists because CPython uses reference counting for memory management. Without a global lock, two threads incrementing or decrementing the same object’s reference count simultaneously would corrupt it. The GIL is the straightforward solution: one lock for the whole interpreter.
In practice this creates two very different experiences:
For I/O-bound tasks, threads work well. A thread waiting on a network response releases the GIL while blocked on the OS call, allowing another thread to run Python bytecode. Ten threads each waiting on different API calls run effectively concurrently because the actual waiting happens outside the Python interpreter.
For CPU-bound tasks (sorting large lists, computing primes, processing images in pure Python), threads provide no speedup. The GIL serialises them. Two CPU-heavy threads often run slower than one because of constant GIL acquisition and release overhead.
Python 3.13 introduced an optional free-threaded build per PEP 703. In this experimental variant, the GIL can be disabled, allowing true CPU parallelism across threads. The standard CPython 3.13 binary still has the GIL; the no-GIL variant ships as a separate build (the python3.13t binary on supported platforms, or compiled from source with --disable-gil). Broad ecosystem support for free-threaded CPython is still maturing as of 2026.
When to Choose multiprocessing or asyncio Instead
| Task type | Recommended module | Reason |
|---|---|---|
| I/O-bound (network, disk, database) | threading or asyncio | GIL released during I/O waits; threads overlap the waits |
| CPU-bound (computation, data processing) | multiprocessing | Each process has its own GIL; multiple cores run in parallel |
| High-concurrency I/O (hundreds of connections) | asyncio | Single-threaded cooperative yielding; minimal per-connection overhead |
The multiprocessing module creates separate processes. Each gets its own memory space and its own GIL, so they run truly in parallel across CPU cores. The tradeoff: spawning processes takes more time than spawning threads, and sharing data between processes requires explicit communication via Queue, Pipe, or shared memory objects.
asyncio runs a single-threaded event loop that switches between coroutines at every await point. No shared-state locking is needed because only one coroutine runs at a time. It handles a high volume of concurrent I/O connections at low overhead, but requires async/await syntax throughout the call stack.
For a single-threaded Python program like a calculator program in Python, there is no concurrency overhead and no need for locks. Threading becomes worthwhile when the program genuinely waits on external resources.
Placement Interview: What Threading Questions Test
Technical interviews at Indian product and service companies often include threading questions to probe understanding of concurrent code. The key talking points:
- The GIL: explain that standard CPython has a GIL, why it exists (reference counting safety without per-object locking), and its consequence (no CPU parallelism from threads alone). If the role involves Python concurrency, interviewers expect this answer.
- Race conditions: when two threads read, modify, and write a shared variable without synchronisation, the result depends on scheduling order. Using
Lockmakes the read-modify-write sequence atomic. - Deadlocks: thread A holds lock X waiting for lock Y; thread B holds lock Y waiting for lock X. Both block forever. The standard prevention: always acquire multiple locks in a consistent order across all threads.
- Daemon vs non-daemon: daemon threads die with the main thread; non-daemon threads keep the process running until they complete.
threadingvsmultiprocessing: threads for I/O-bound tasks; processes for CPU-bound tasks.
The threading and asyncio patterns above apply directly when building AI application backends: batching parallel LLM API calls, streaming responses to multiple clients, and building concurrent data pipelines all depend on managing concurrent execution. TinkerLLM at ₹299 is a hands-on sandbox where you can test these concurrency patterns against real LLM APIs, seeing the difference between sequential and threaded request handling before you build at scale.
Primary sources
Frequently asked questions
What is the GIL in Python and why does it exist?
The Global Interpreter Lock (GIL) is a mutex in CPython that allows only one thread to execute Python bytecode at a time. It simplifies memory management by making reference counting thread-safe without fine-grained locking on every object.
Can Python threads run in parallel on multiple CPU cores?
In standard CPython, no. The GIL prevents true parallelism for CPU-bound threads. Use multiprocessing for CPU parallelism. Python 3.13 introduced an optional no-GIL build (PEP 703), but it is experimental and off by default.
What is a daemon thread in Python?
A daemon thread is marked with daemon=True. When all non-daemon threads finish, Python exits and kills any remaining daemon threads automatically, without waiting for them to complete.
What is the difference between Lock and RLock in Python's threading module?
Lock can only be acquired once before it must be released. RLock (re-entrant lock) can be acquired multiple times by the same thread without deadlocking, useful for recursive code that needs to re-enter a locked section.
When should I use multiprocessing instead of threading in Python?
Use multiprocessing when the task is CPU-bound (number crunching, image processing, sorting large arrays). Each process gets its own GIL, so multiple cores run truly in parallel.
What is a race condition, and how does Lock prevent it?
A race condition happens when two threads read and modify shared data at the same time, producing inconsistent results. A Lock ensures only one thread enters the critical section at a time, making the update atomic.
A self-paced playground for building with LLMs.
TinkerLLM is FACE Prep's sister property. A guided environment for shipping real LLM applications, the kind of project that earns a paragraph on your resume, not a line.
Try TinkerLLM (₹299 launch)