Python Threading Tutorial: Basics, Advanced Usage, and Best Practices

目次

1. Introduction

Python is a programming language loved by many developers for its simplicity and flexibility. Among its powerful features, threading is one of the essential techniques for efficient program design. In this article, we will explain threading in Python clearly, from the basics to advanced usage.

What is a Thread?

A thread is a small unit of execution that runs independently within a program. By running multiple threads inside a single process, tasks can be executed concurrently. This mechanism improves program speed and enables efficient use of resources.

Why Learn Threading in Python?

By leveraging threads, you can effectively solve the following problems:
  1. Efficient Handling of I/O Waits Tasks with heavy I/O operations such as file handling or network communication can shorten waiting time using threads.
  2. Simultaneous Execution of Multiple Tasks For example, handling large amounts of data at once or sending multiple API requests concurrently.
  3. Improved User Experience In GUI applications, running background tasks keeps the interface responsive.

What You Will Learn in This Article

This article covers the following about Python threading:
  • Basic concepts of threads and how to use them
  • How to prevent data races between threads
  • The mechanism and impact of the GIL (Global Interpreter Lock)
  • How to use threads in real programs
  • Best practices and precautions
From beginner-friendly explanations to practical use cases, this guide is ideal for anyone who wants to deeply understand threading in Python.

2. Basic Concepts of Threads

Threads are a fundamental mechanism for implementing concurrency within a program. This section covers the basics of threads and explains how they differ from processes and parallel processing.

What is a Thread?

A thread is a single unit of execution that runs independently within a program. Typically, a program runs as a process, which may contain one or more threads. For example, in a web browser, multiple threads run concurrently such as:
  • Monitoring user input
  • Rendering web pages
  • Streaming video playback
Using threads allows these tasks to run efficiently at the same time.

Differences Between Processes and Threads

To fully understand threads, it is important to first distinguish them from processes.
ItemProcessThread
Memory SpaceIndependentShared within a process
Creation CostHigh (each process allocates its own memory)Low (memory is shared, making it more efficient)
Communication MethodRequires IPC (Inter-Process Communication)Can share data directly
Granularity of ConcurrencyLargeSmall
In Python, threads allow efficient concurrency by sharing resources within a single process.

Differences Between Concurrency and Parallelism

When learning about “threads,” it is crucial to understand the difference between concurrency and parallelism.
  • Concurrency: Tasks are executed in small alternating steps, giving the appearance of simultaneous execution. Python threads are suitable for concurrency. Example: One cashier serving multiple customers in turn.
  • Parallelism: Multiple tasks are executed physically at the same time. This requires multiple CPU cores and is usually handled by multiprocessing in Python. Example: Several cashiers serving different customers simultaneously.
Python threads excel at concurrency, particularly for I/O-bound tasks (such as file handling or network communication).

Characteristics of Threads in Python

Python provides the threading module as part of its standard library, allowing threads to be created and managed easily. However, Python threads have the following features and limitations:
  1. Global Interpreter Lock (GIL) The GIL ensures that only one thread can execute Python bytecode at a time. Because of this, threads are limited in their effectiveness for CPU-bound tasks (computations requiring heavy CPU usage).
  2. Best for I/O-Bound Tasks Threads are optimal for tasks such as network communication or file I/O, where waiting times dominate execution.

Practical Use Cases for Threads

Here are a few common scenarios where threads are useful:
  • Web Scraping: Retrieve multiple web pages concurrently.
  • Database Access: Handle multiple client requests asynchronously.
  • Background Tasks: Perform heavy processing in threads while keeping the main thread responsive to user input.
侍エンジニア塾

3. Creating Threads in Python

In Python, the threading module makes it easy to create threads and implement concurrency. This section explains basic thread creation and operations.

Overview of the threading Module

The threading module is the standard library for creating and managing threads in Python. With it, you can:
  • Create and start threads
  • Synchronize between threads
  • Manage thread states
Because threads are treated as objects, the threading module allows simple and flexible operations.

Basic Method for Creating Threads

The most common way to create a thread is by using the Thread class. Below is a basic example of creating and running a thread:
import threading
import time

# Function to run in the thread
def print_numbers():
    for i in range(5):
        print(f"Number: {i}")
        time.sleep(1)

# Create a thread
thread = threading.Thread(target=print_numbers)

# Start the thread
thread.start()

# Main thread processing
print("Main thread is running...")

# Wait for the thread to finish
thread.join()
print("Thread has completed.")

Key Points of the Code

  1. Creating a Thread: Specify the function to run in the thread using the target argument of the threading.Thread class.
  2. Starting the Thread: Call the start() method to begin execution.
  3. Waiting for Completion: Use the join() method to block the main thread until the specified thread finishes execution.
In this example, the print_numbers function runs in a separate thread, while the main thread continues independently.

Passing Arguments to Threads

If you need to pass parameters to a thread, you can use the args argument. Here’s an example:
def print_numbers_with_delay(delay):
    for i in range(5):
        print(f"Number: {i}")
        time.sleep(delay)

# Create a thread with arguments
thread = threading.Thread(target=print_numbers_with_delay, args=(2,))
thread.start()
thread.join()

Key Points

  • Arguments are passed in tuple form, such as args=(2,).
  • In the example above, the value 2 is passed to delay, causing a 2-second pause between each loop iteration.

Creating Threads Using Classes

For more advanced thread operations, you can extend the Thread class and create your own custom thread class.
class CustomThread(threading.Thread):
    def __init__(self, name):
        super().__init__()
        self.name = name

    def run(self):
        for i in range(5):
            print(f"{self.name} is running: {i}")
            time.sleep(1)

# Create thread instances
thread1 = CustomThread(name="Thread 1")
thread2 = CustomThread(name="Thread 2")

# Start threads
thread1.start()
thread2.start()

# Wait for threads to finish
thread1.join()
thread2.join()
print("All threads have completed.")

Key Points of the Code

  1. The run Method: Override the run method of the Thread class to define the behavior of the thread.
  2. Named Threads: Assigning names to threads makes debugging and log identification easier.

Managing Thread States

The following methods are useful when managing the state of threads:
  • is_alive(): Checks if a thread is currently running.
  • setDaemon(True): Sets the thread as a daemon (background) thread.

Example of a Daemon Thread

def background_task():
    while True:
        print("Background task is running...")
        time.sleep(2)

# Create a daemon thread
thread = threading.Thread(target=background_task)
thread.setDaemon(True)  # Set as daemon
thread.start()

print("Main thread is exiting.")
# The daemon thread will automatically exit when the main thread finishes
Daemon threads automatically terminate when the main thread exits. This property is useful for implementing background tasks.

4. Synchronizing Data Between Threads

When using threads in Python, conflicts can occur if multiple threads access the same resource simultaneously. This section explains synchronization methods to prevent data races.

What is a Data Race?

A data race occurs when multiple threads simultaneously modify the same resource (such as a variable or file). This can lead to unintended results or program errors.

Example of a Data Race

import threading

counter = 0

def increment():
    global counter
    for _ in range(1000000):
        counter += 1

# Create two threads
thread1 = threading.Thread(target=increment)
thread2 = threading.Thread(target=increment)

thread1.start()
thread2.start()

thread1.join()
thread2.join()

print(f"Counter value: {counter}")
In this example, two threads update the counter variable simultaneously. Due to the race condition, the final value may not match the expected result (2,000,000).

Synchronization with Locks

To prevent data races, you can use the Lock object from the threading module to synchronize between threads.

Basic Usage of Locks

import threading

counter = 0
lock = threading.Lock()

def increment_with_lock():
    global counter
    for _ in range(1000000):
        # Acquire the lock before updating
        with lock:
            counter += 1

# Create two threads
thread1 = threading.Thread(target=increment_with_lock)
thread2 = threading.Thread(target=increment_with_lock)

thread1.start()
thread2.start()

thread1.join()
thread2.join()

print(f"Counter value with lock: {counter}")

Key Points

  1. with lock Syntax: Using with simplifies acquiring and releasing locks.
  2. Lock Acquisition and Release: Once a lock is acquired, other threads must wait until it is released.
In this example, using a lock ensures that counter reaches the intended value (2,000,000).

Recursive Lock (RLock)

While Lock is a simple locking mechanism, sometimes a thread needs to acquire the same lock multiple times. In such cases, use RLock (Recursive Lock).

Example of RLock

import threading

lock = threading.RLock()

def nested_function():
    with lock:
        print("First level lock acquired")
        with lock:
            print("Second level lock acquired")

thread = threading.Thread(target=nested_function)
thread.start()
thread.join()

Key Points

  • RLock allows the same thread to acquire the lock multiple times.
  • Useful in cases where nested lock management is required.

Synchronization with Semaphores

threading.Semaphore is used to limit the number of threads that can access a resource simultaneously.

Example of a Semaphore

import threading
import time

semaphore = threading.Semaphore(2)

def access_resource(name):
    with semaphore:
        print(f"{name} is accessing the resource")
        time.sleep(2)
        print(f"{name} has released the resource")

threads = []
for i in range(5):
    thread = threading.Thread(target=access_resource, args=(f"Thread-{i}",))
    threads.append(thread)
    thread.start()

for thread in threads:
    thread.join()

Key Points

  • A semaphore restricts the number of threads that can access a resource at the same time.
  • In this example, a maximum of two threads can access the resource concurrently.

Synchronization with Events

threading.Event is used for signaling between threads.

Example of an Event

import threading
import time

event = threading.Event()

def wait_for_event():
    print("Thread is waiting for event...")
    event.wait()
    print("Event has been set. Proceeding with task.")

def set_event():
    time.sleep(2)
    print("Setting event")
    event.set()

thread1 = threading.Thread(target=wait_for_event)
thread2 = threading.Thread(target=set_event)

thread1.start()
thread2.start()

thread1.join()
thread2.join()

Key Points

  • wait() blocks a thread until the event is set.
  • set() signals the event, resuming all waiting threads.

Summary

Choosing the right synchronization method is key to preventing data races:
  • Use Lock for simple synchronization
  • Use RLock for nested lock scenarios
  • Use Semaphore to limit concurrent thread access
  • Use Event for signaling between threads

5. GIL and Thread Limitations

When working with threads in Python, one unavoidable topic is the Global Interpreter Lock (GIL). Understanding how the GIL works and its limitations is essential for using threads effectively.

What is the GIL?

The Global Interpreter Lock (GIL) is a locking mechanism used internally by the Python interpreter (specifically CPython). It ensures that only one thread executes Python bytecode at a time.

The Role of the GIL

  • Introduced to guarantee safe memory management.
  • Mainly ensures consistency of Python objects, especially reference counting.
However, because of the GIL, Python threads are restricted when performing CPU-bound tasks.

Example of GIL in Action

The following example runs two CPU-intensive tasks using threads:
import threading
import time

def cpu_bound_task():
    start = time.time()
    count = 0
    for _ in range(10**7):
        count += 1
    print(f"Task completed in: {time.time() - start:.2f} seconds")

# Create two threads
thread1 = threading.Thread(target=cpu_bound_task)
thread2 = threading.Thread(target=cpu_bound_task)

start_time = time.time()

thread1.start()
thread2.start()

thread1.join()
thread2.join()

print(f"Total time: {time.time() - start_time:.2f} seconds")

Result Analysis

Even though two threads are created, the execution time does not become twice as fast. This is because the GIL prevents threads from running CPU-bound code simultaneously.

Situations Where the GIL Matters

  1. CPU-Bound Tasks: Tasks such as numerical computation or image processing are heavily limited by the GIL, offering little benefit from multithreading.
  2. I/O-Bound Tasks: Tasks such as file handling or network communication involve significant waiting time, so the GIL has little impact. Threads can still provide performance benefits here.

Ways to Overcome GIL Limitations

1. Use Multiprocessing

A common way to bypass the GIL is by using the multiprocessing module, which creates multiple independent processes that each have their own memory space.
from multiprocessing import Process
import time

def cpu_bound_task():
    start = time.time()
    count = 0
    for _ in range(10**7):
        count += 1
    print(f"Task completed in: {time.time() - start:.2f} seconds")

# Create two processes
process1 = Process(target=cpu_bound_task)
process2 = Process(target=cpu_bound_task)

start_time = time.time()

process1.start()
process2.start()

process1.join()
process2.join()

print(f"Total time: {time.time() - start_time:.2f} seconds")

Key Points

  • Each process has its own memory space, so the GIL does not apply.
  • For CPU-bound tasks, multiprocessing is more efficient than threading.

2. Use C Extensions

Some Python C extensions (such as NumPy and Pandas) release the GIL internally to perform computations in parallel. This can significantly improve performance in CPU-bound workloads.
  • NumPy can accelerate numerical operations.
  • Cython or Numba can compile Python code for optimization.

3. Use asyncio

For I/O-bound tasks, asyncio can be used instead of threads to achieve efficient concurrency in a single thread.
import asyncio

async def io_bound_task(name, delay):
    print(f"{name} started")
    await asyncio.sleep(delay)
    print(f"{name} completed")

async def main():
    await asyncio.gather(
        io_bound_task("Task 1", 2),
        io_bound_task("Task 2", 3)
    )

asyncio.run(main())

Key Points

  • asyncio avoids the GIL’s limitations since it uses cooperative multitasking.
  • Best suited for network communication and file I/O operations.

Pros and Cons of the GIL

Advantages
  • Simplifies Python memory management.
  • Improves data safety in single-threaded environments.
Disadvantages
  • Limits performance improvements from multithreading.
  • Requires multiprocessing for CPU-bound tasks.

Summary

The GIL is a major limitation of threading in Python, but by understanding its impact, you can choose the appropriate approach:
  • For CPU-bound tasks: use multiprocessing or C extensions.
  • For I/O-bound tasks: use threads or asyncio.

6. Practical Examples: Programs Using Threads

Threads, when used appropriately, can handle complex tasks efficiently. This section introduces concrete examples of programs that use Python threads.

1. Concurrent Web Scraping

In web scraping, using threads to fetch multiple pages at once can significantly reduce processing time.
import threading
import requests

def fetch_url(url):
    response = requests.get(url)
    print(f"Fetched {url}: {len(response.content)} bytes")

urls = [
    "https://example.com",
    "https://httpbin.org",
    "https://www.python.org",
]

threads = []

# Create a thread for each URL
for url in urls:
    thread = threading.Thread(target=fetch_url, args=(url,))
    threads.append(thread)
    thread.start()

# Wait for all threads to finish
for thread in threads:
    thread.join()

print("All URLs fetched.")

Key Points

  • Multiple URLs are fetched concurrently using threads.
  • The requests library makes sending HTTP requests simple.

2. Simultaneous File Read/Write

Threads can be used to efficiently read and write large numbers of files at the same time.
import threading

def write_to_file(filename, content):
    with open(filename, 'w') as f:
        f.write(content)
    print(f"Wrote to {filename}")

files = [
    ("file1.txt", "Content for file 1"),
    ("file2.txt", "Content for file 2"),
    ("file3.txt", "Content for file 3"),
]

threads = []

# Create a thread for each file
for filename, content in files:
    thread = threading.Thread(target=write_to_file, args=(filename, content))
    threads.append(thread)
    thread.start()

# Wait for all threads to finish
for thread in threads:
    thread.join()

print("All files written.")

Key Points

  • Each thread independently writes to a separate file, speeding up the process.
  • Effective when multiple threads access different resources simultaneously.

3. Background Processing in GUI Applications

In GUI applications, threads are often used to run heavy tasks in the background while the main thread handles user interactions. Here’s a simple example using tkinter:
import threading
import time
from tkinter import Tk, Button, Label

def long_task(label):
    label.config(text="Task started...")
    time.sleep(5)  # Simulating a long task
    label.config(text="Task completed!")

def start_task(label):
    thread = threading.Thread(target=long_task, args=(label,))
    thread.start()

# Setup GUI
root = Tk()
root.title("Threaded GUI Example")

label = Label(root, text="Click the button to start the task.")
label.pack(pady=10)

button = Button(root, text="Start Task", command=lambda: start_task(label))
button.pack(pady=10)

root.mainloop()

Key Points

  • Threads prevent the UI from freezing during long-running tasks.
  • Use threading.Thread for asynchronous task execution.

4. Real-Time Data Processing

When processing real-time data such as sensor readings or log streams, threads allow concurrent handling.
import threading
import time
import random

def process_data(sensor_name):
    for _ in range(5):
        data = random.randint(0, 100)
        print(f"{sensor_name} read data: {data}")
        time.sleep(1)

sensors = ["Sensor-1", "Sensor-2", "Sensor-3"]

threads = []

# Create a thread for each sensor
for sensor in sensors:
    thread = threading.Thread(target=process_data, args=(sensor,))
    threads.append(thread)
    thread.start()

# Wait for all threads to finish
for thread in threads:
    thread.join()

print("All sensor data processed.")

Key Points

  • Each thread processes data from an independent sensor.
  • Suitable for real-time data collection and analysis.

Summary

Through these examples, we learned how to design efficient programs with Python threads:
  • Web Scraping: Parallel data collection
  • File Operations: Faster simultaneous read/write
  • GUI Applications: Responsive background processing
  • Real-Time Data: Concurrent handling of live input
Threads are powerful, but it’s important to design carefully to avoid data races and deadlocks.

7. Best Practices for Using Threads

Threads are a powerful tool for efficient concurrency, but if used incorrectly, they can cause issues such as deadlocks and data races. This section introduces the best practices you should follow when using threads in Python.

1. Avoiding Deadlocks

A deadlock occurs when multiple threads wait indefinitely for each other’s locks. To avoid this, it is important to standardize the order and method of acquiring locks.

Example of a Deadlock

import threading
import time

lock1 = threading.Lock()
lock2 = threading.Lock()

def thread1_task():
    with lock1:
        print("Thread 1 acquired lock1")
        time.sleep(1)
        with lock2:
            print("Thread 1 acquired lock2")

def thread2_task():
    with lock2:
        print("Thread 2 acquired lock2")
        time.sleep(1)
        with lock1:
            print("Thread 2 acquired lock1")

thread1 = threading.Thread(target=thread1_task)
thread2 = threading.Thread(target=thread2_task)

thread1.start()
thread2.start()

thread1.join()
thread2.join()
This code results in a deadlock because lock1 and lock2 are each waiting to be released.

Solutions

  1. Standardize Lock Order: Always acquire locks in the same order across all threads.
  2. Set Timeouts: Specify a timeout when acquiring a lock so that threads don’t wait indefinitely.
lock1.acquire(timeout=1)

2. Optimizing the Number of Threads

Creating too many threads can introduce overhead and reduce performance. Choose the optimal number of threads based on the type of task.

General Guidelines

  • I/O-bound tasks: Set a higher number of threads (e.g., 2× the CPU core count or more).
  • CPU-bound tasks: Set the number of threads equal to or fewer than the CPU core count.

3. Safely Stopping Threads

Safely terminating threads is crucial for program stability. The threading module does not provide a way to forcibly stop threads, so you need to manage termination conditions inside the thread.

Example of Safe Thread Termination

import threading
import time

class SafeThread(threading.Thread):
    def __init__(self):
        super().__init__()
        self._stop_event = threading.Event()

    def run(self):
        while not self._stop_event.is_set():
            print("Thread is running...")
            time.sleep(1)

    def stop(self):
        self._stop_event.set()

thread = SafeThread()
thread.start()

time.sleep(5)
thread.stop()
thread.join()
print("Thread has been safely stopped.")

Key Points

  • Use flags or events inside the thread to monitor termination conditions.
  • Call stop() explicitly to set a stop condition.

4. Using Logging for Debugging

To trace thread execution, use the logging module instead of print. This provides more detailed information including thread names and timestamps.

Logging Example

import threading
import logging

logging.basicConfig(level=logging.DEBUG, format='%(threadName)s: %(message)s')

def task():
    logging.debug("Task started")
    logging.debug("Task completed")

thread = threading.Thread(target=task, name="MyThread")
thread.start()
thread.join()

Key Points

  • Explicitly name threads to improve log readability.
  • Use log levels (DEBUG, INFO, WARNING, etc.) to distinguish importance.

5. Choosing Between Threads and Async Processing

Threads are effective for I/O-bound tasks, but in some cases asyncio may be more efficient.
  • Threads are suitable when:
    • Running background tasks in GUI applications
    • Manipulating data shared with other threads or processes
  • Async is suitable when:
    • Handling large volumes of I/O tasks efficiently
    • State management is relatively simple

6. Keep Design Simple

Overusing threads can make code unnecessarily complex. Keep these points in mind for a maintainable design:
  • Limit the number of threads to what is strictly necessary.
  • Clearly define each thread’s role.
  • Minimize data sharing between threads; if possible, use queues for communication.

8. Conclusion

Python threads are a powerful tool for improving program efficiency. In this article, we covered everything from the basics to advanced techniques and best practices. Let’s recap the key points you should keep in mind when using threads.

Main Takeaways

  1. Basic Concepts of Threads
  • A thread is a small unit of execution that runs independently within a process, enabling concurrency.
  • Understand the difference between concurrency and parallelism, and use each appropriately.
  1. Creating Threads in Python
  • You can easily create threads with the threading module.
  • Control execution with the start() and join() methods of the Thread class.
  • Create custom thread classes for more flexible control.
  1. Thread Synchronization
  • Prevent data races using synchronization objects like Lock, RLock, and Semaphore.
  • Use events and timeouts to refine thread coordination.
  1. Impact of the GIL
  • The GIL limits Python threads in CPU-bound tasks.
  • For CPU-bound workloads, use multiprocessing; for I/O-bound workloads, use threads.
  1. Practical Examples
  • We demonstrated threading in real-world scenarios such as web scraping, file operations, GUI applications, and real-time data processing.
  1. Best Practices
  • Avoid deadlocks, optimize thread count, ensure safe termination, and leverage logging for debugging.
  • Careful design improves both efficiency and program stability.

Mindset When Using Threads

  • Threads Are Not a Silver Bullet Threads are powerful, but using them incorrectly can hurt performance. Always choose them for the right scenarios.
  • Keep Design Simple Overusing threads increases complexity. Define clear roles, minimize shared data, and simplify synchronization.
  • Consider Alternatives In some cases, asyncio or multiprocessing may be a better fit depending on the workload.

Next Steps

After mastering the basics of threading, consider exploring these topics further:
  1. Asynchronous Programming
  • Learn Python’s asyncio module to implement efficient asynchronous processing in a single thread.
  1. Multiprocessing
  • Overcome GIL limitations by optimizing parallel execution for CPU-bound tasks.
  1. Advanced Thread Control
  • Use thread pools (concurrent.futures.ThreadPoolExecutor) and debugging tools for more efficient thread management.
  1. Apply to Real-World Scenarios
  • Work on projects like web crawlers or real-time data pipelines to build hands-on skills.

Final Thoughts

When properly designed and managed, Python threads can deliver powerful concurrency. Apply the knowledge you’ve learned here to build more efficient and stable programs.