2025-09-10 Share on: Twitter | Facebook | HackerNews | Reddit

Threading vs Multiprocessing in Python - GIL Implications and Choosing the Right Tool

This post is part 2 of the "Python async" series:

Core Principle

Python has two built-in ways to run code concurrently: threading and multiprocessing. The critical difference comes down to the Global Interpreter Lock (GIL) - a mutex that protects access to Python objects, preventing multiple threads from executing Python bytecode at once. This means:

Threading - multiple threads in one process, but only one thread executes Python code at a time due to the GIL. Good for I/O-bound tasks.
Multiprocessing - separate processes with separate Python interpreters, each with its own GIL. True parallelism for CPU-bound tasks.

import threading
import multiprocessing
import time

def cpu_bound_task(n):
    # Heavy computation
    return sum(i * i for i in range(n))

# Threading - doesn't help with CPU work
start = time.time()
threads = [threading.Thread(target=cpu_bound_task, args=(10_000_000,)) for _ in range(4)]
for t in threads: t.start()
for t in threads: t.join()
print(f"Threading: {time.time() - start:.2f}s")  # ~4 seconds on 4-core machine

# Multiprocessing - achieves true parallelism
start = time.time()
processes = [multiprocessing.Process(target=cpu_bound_task, args=(10_000_000,)) for _ in range(4)]
for p in processes: p.start()
for p in processes: p.join()
print(f"Multiprocessing: {time.time() - start:.2f}s")  # ~1 second on 4-core machine

History: The GIL has been in CPython since the beginning. PEP 703 (approved in 2023) outlines a path to making the GIL optional in Python 3.13+, but it's still fundamental to understand for current Python versions.

Useful Extensions

Threading with ThreadPoolExecutor (simpler interface):

from concurrent.futures import ThreadPoolExecutor
import requests

def fetch_url(url):
    response = requests.get(url)
    return len(response.content)

urls = ["http://example.com"] * 10

# Old way - manual thread management
threads = [threading.Thread(target=fetch_url, args=(url,)) for url in urls]
for t in threads: t.start()
for t in threads: t.join()

# Better way - thread pool handles everything
with ThreadPoolExecutor(max_workers=5) as executor:
    results = executor.map(fetch_url, urls)
    print(list(results))

Multiprocessing with ProcessPoolExecutor:

from concurrent.futures import ProcessPoolExecutor

def expensive_calculation(n):
    return sum(i * i for i in range(n))

numbers = [10_000_000] * 4

with ProcessPoolExecutor() as executor:
    results = executor.map(expensive_calculation, numbers)
    print(list(results))

Sharing data between processes:

from multiprocessing import Process, Queue, Value, Array
import ctypes

# Queue - safe inter-process communication
def worker(queue):
    queue.put("Result from worker")

q = Queue()
p = Process(target=worker, args=(q,))
p.start()
print(q.get())
p.join()

# Shared memory with Value and Array
def increment_counter(counter, arr):
    counter.value += 1
    arr[0] += 10

counter = Value('i', 0)  # shared integer
arr = Array('i', [0, 1, 2])  # shared array

processes = [Process(target=increment_counter, args=(counter, arr)) for _ in range(5)]
for p in processes: p.start()
for p in processes: p.join()

print(f"Counter: {counter.value}")  # 5
print(f"Array: {list(arr)}")  # [50, 1, 2]

Thread-safe operations with Lock:

import threading

counter = 0
lock = threading.Lock()

def increment():
    global counter
    for _ in range(100_000):
        with lock:  # Only one thread can execute this block at a time
            counter += 1

threads = [threading.Thread(target=increment) for _ in range(10)]
for t in threads: t.start()
for t in threads: t.join()

print(counter)  # 1,000,000 (correct with lock, random without)

Specific Use Cases

Use threading for I/O-bound tasks:

Making multiple HTTP requests (web scraping, API calls)
Reading/writing multiple files
Database queries where you're waiting for responses
Network operations (socket communication)
Any operation where you spend time waiting for external resources

The GIL doesn't matter here because threads release it during I/O operations. While one thread waits for network/disk, others can run.

Use multiprocessing for CPU-bound tasks:

Image/video processing
Data analysis and numerical computations
Encryption/decryption
Machine learning model training
Parsing large files
Any computation-heavy work where the CPU is the bottleneck

You need separate processes to get around the GIL and use multiple CPU cores effectively.

Real-world example - web scraper:

from concurrent.futures import ThreadPoolExecutor
import requests

def scrape_page(url):
    response = requests.get(url)
    # Process the page
    return extract_data(response.text)

urls = ["http://example.com/page1", "http://example.com/page2", ...]

# Threading is perfect here - lots of waiting for network responses
with ThreadPoolExecutor(max_workers=10) as executor:
    results = list(executor.map(scrape_page, urls))

Real-world example - image processing:

from concurrent.futures import ProcessPoolExecutor
from PIL import Image

def process_image(filename):
    img = Image.open(filename)
    # CPU-intensive operations
    img = img.resize((800, 600))
    img = img.filter(ImageFilter.SHARPEN)
    img.save(f"processed_{filename}")
    return filename

images = ["img1.jpg", "img2.jpg", "img3.jpg", ...]

# Multiprocessing is needed - CPU-intensive work
with ProcessPoolExecutor() as executor:
    results = list(executor.map(process_image, images))

Nuances

The GIL releases during I/O operations - this is why threading works for I/O-bound tasks. When a thread calls a blocking I/O function (like requests.get() or file.read()), it releases the GIL so other threads can run. The GIL only prevents multiple threads from executing Python bytecode simultaneously.

Multiprocessing has overhead - creating processes is expensive (memory and startup time). Each process needs its own Python interpreter and memory space. For small tasks, this overhead can outweigh the benefits of parallelism:

# Bad - overhead dominates
with ProcessPoolExecutor() as executor:
    results = executor.map(lambda x: x * 2, range(100))

# Good - task is substantial enough to justify processes
with ProcessPoolExecutor() as executor:
    results = executor.map(expensive_function, large_dataset)

Data serialization between processes - when you pass data to a process or get results back, Python uses pickle to serialize it. Large objects or objects that can't be pickled cause problems:

# This won't work - lambda functions can't be pickled
with ProcessPoolExecutor() as executor:
    results = executor.map(lambda x: x * 2, numbers)  # Error

# This works - regular functions can be pickled
def multiply_by_two(x):
    return x * 2

with ProcessPoolExecutor() as executor:
    results = executor.map(multiply_by_two, numbers)  # Works

Threading has less isolation - all threads share the same memory space, which means bugs in one thread (like accessing shared data without locks) can corrupt data across the entire program. Processes are isolated - a crash in one process doesn't affect others.

When neither helps - if your program is both CPU and I/O bound, you might need a hybrid approach: processes for CPU work, each using threads for I/O. Or consider asyncio for I/O operations instead of threads if you're doing lots of concurrent I/O.

Process pool size guidelines - for CPU-bound work, use os.cpu_count() workers (one per CPU core). For I/O-bound work with threads, you can use many more (tens or hundreds) since threads spend most time waiting. Experiment to find the sweet spot.

import os
from concurrent.futures import ThreadPoolExecutor, ProcessPoolExecutor

# CPU-bound - match core count
with ProcessPoolExecutor(max_workers=os.cpu_count()) as executor:
    results = executor.map(cpu_intensive_func, data)

# I/O-bound - can use many more
with ThreadPoolExecutor(max_workers=50) as executor:
    results = executor.map(io_intensive_func, data)

The simple decision tree:

Waiting for network/disk/external services? Use threading (or asyncio)
Doing heavy calculations/data processing? Use multiprocessing
Doing simple sequential work? Use neither - regular code is simpler

Core Principle

Useful Extensions

Specific Use Cases

Nuances

You might enjoy