Python Multithreading Guide: From Beginner to Pro

目次

1. Introduction

Python is a programming language used by a wide range of users, from beginners to advanced developers, thanks to its simple, easy-to-use syntax and extensive libraries. Among its features, multithreading is an important technique that can dramatically improve processing efficiency in certain situations.

Why use multithreading in Python

As computer performance improves, the demands on the amount of data and speed a program must handle at once have increased. Multithreading can be especially effective in the following scenarios:
  • Processing large amounts of data: Retrieving data from databases or handling a large number of files can reduce processing time through parallelization.
  • Improving I/O efficiency: In programs with heavy I/O, such as file reads/writes or network communication, you can minimize waiting time.
  • Real-time requirements: In game or user interface programming, where multiple tasks must run simultaneously, multithreading becomes essential.

Benefits and challenges of multithreading

Advantages

  1. Increased processing speed: Multiple threads running concurrently can distribute tasks more efficiently.
  2. Effective use of resources: Even if some threads are idle, others can utilize CPU resources.

Challenges

  1. Global Interpreter Lock (GIL) limitations: In Python, the presence of the GIL can limit the effectiveness of multithreading.
  2. Debugging complexity: Issues like race conditions and deadlocks are more likely to occur, which can make debugging time-consuming.

Purpose of this article

This article explains the basic concepts and concrete methods for implementing multithreading in Python. It also includes practical examples and points to watch for, so you can learn how to apply these techniques in real-world work. The material is presented step by step to be easy to understand for beginners to intermediate users, so please read through to the end.

2. Comparison of Multithreading and Multiprocessing

In programming, both multithreading and multiprocessing are important techniques for achieving parallel processing, but each has different characteristics and use cases. This section explains their differences and how to choose between them in Python.

Basic differences between threads and processes

What is a thread

A thread is a unit of parallel execution within a single process. Because threads share the same memory space, data exchange is fast.
  • Characteristics:
  • Share the same memory space
  • Lightweight and fast to start
  • Easy data sharing

What is a process

A process is an execution unit with its own independent memory space. Because each process has its own resources, they are less likely to affect each other.
  • Characteristics:
  • Has an independent memory space
  • More heavyweight and slower to start
  • Additional mechanisms are required for data sharing

Impact of the GIL (Global Interpreter Lock) in Python

Python has a constraint called the GIL (Global Interpreter Lock). This lock makes it so that only one Python thread can execute at a time. Because of the GIL, using multithreading may not fully utilize a CPU’s multicore capabilities.
  • Cases that are more affected by the GIL:
  • CPU-intensive computations (e.g., numerical calculations or image processing)
  • Cases that are less affected by the GIL:
  • I/O-bound workloads (e.g., network communication, file operations)

Choosing between multithreading and multiprocessing

When to choose multithreading

  • Use cases:
  • Programs with a lot of I/O operations
  • When you need to run lightweight tasks in parallel
  • Examples: Web scraping, concurrent file downloads

When to choose multiprocessing

  • Use cases:
  • CPU-intensive computations
  • When you want to avoid the GIL’s limitations
  • Examples: Training machine learning models, image processing

Simple comparison examples in Python

Below are simple Python code examples that use the threading module and the multiprocessing module to demonstrate basic parallel processing.

Multithreading example

import threading
import time

def task(name):
    print(f"{name} スタート")
    time.sleep(2)
    print(f"{name} 終了")

threads = []
for i in range(3):
    thread = threading.Thread(target=task, args=(f"スレッド {i+1}",))
    threads.append(thread)
    thread.start()

for thread in threads:
    thread.join()

print("全スレッド終了")

Multiprocessing example

from multiprocessing import Process
import time

def task(name):
    print(f"{name} スタート")
    time.sleep(2)
    print(f"{name} 終了")

processes = []
for i in range(3):
    process = Process(target=task, args=(f"プロセス {i+1}",))
    processes.append(process)
    process.start()

for process in processes:
    process.join()

print("全プロセス終了")

Conclusion

Both multithreading and multiprocessing have their appropriate uses. When implementing parallel processing in Python, it’s important to consider the nature of your program and the effects of the GIL, and choose the best approach accordingly.
RUNTEQ(ランテック)|超実戦型エンジニア育成スクール

3. Basic Concepts of Threads and Processes

To correctly understand and make use of multithreading and multiprocessing, it’s important to know their basic mechanisms and characteristics. This section explains how threads and processes operate and in which situations each is appropriate.

Basic Concepts of Threads

Role of Threads

A thread refers to an independent flow of execution within a process. Multiple threads within the same process share the memory space, which allows smooth data sharing and communication.
  • Characteristics:
  • A lightweight unit that runs within a process.
  • Because they share memory space, data exchange is fast.
  • Synchronization and race-condition control between threads are required.

Advantages and Challenges of Threads

  • Advantages:
  • High memory efficiency.
  • Lightweight, with fast startup and switching.
  • Challenges:
  • There is a risk of data races and deadlocks on shared data.
  • In Python, threads are affected by the GIL, so they are not suitable for CPU-bound tasks.

Basic Concepts of Processes

Role of Processes

A process is an independent execution environment allocated by the operating system. Each process has its own memory space and does not affect others.
  • Characteristics:
  • Uses a completely independent memory space.
  • High security and stability.
  • If inter-process communication (IPC) is needed, it becomes a bit more complex.

Advantages and Challenges of Processes

  • Advantages:
  • Not affected by the GIL, so ideal for CPU-bound workloads.
  • Because processes are independent, they offer higher stability.
  • Challenges:
  • Starting and switching processes incurs overhead.
  • Increases memory usage.

Comparison of Threads and Processes

FeatureThreadProcess
Memory SpaceShare the same memory spaceIndependent memory space
LightweightnessLightweightHeavyweight
Startup SpeedFastSlower
Data SharingEasyRequires IPC (inter-process communication)
Impact of the GILAffectedNot affected
Use CasesI/O-bound tasksCPU-intensive computations

How the Global Interpreter Lock (GIL) Works

In Python, the GIL controls thread execution. The GIL ensures that only one thread can execute Python bytecode at a time. This helps prevent data races between threads, but it can also limit efficient utilization of multicore CPUs.
  • Advantages of the GIL:
  • Prevents data races between threads and provides thread safety.
  • Disadvantages of the GIL:
  • For CPU-bound tasks, multithreading performance is limited.

Criteria for Choosing Between Threads and Processes

When doing parallel processing in Python, it’s good to choose between threads and processes based on the following criteria.
  • When to Choose Threads:
  • Most of the work spends time waiting for I/O (e.g., network communication).
  • You want to keep memory usage low.
  • When to Choose Processes:
  • CPU-intensive workloads (e.g., numerical computations).
  • You want to efficiently utilize multiple cores.

4. Implementing Multithreading in Python

When implementing multithreading in Python, use the standard library threading module. This section covers, with concrete code examples, everything from creating basic threads to advanced control.

Basic Usage of the threading Module

Creating and Running Threads

In the threading module, create and run threads using the Thread class. Below is a basic example.
import threading
import time

def print_message(message):
    print(f"開始: {message}")
    time.sleep(2)
    print(f"終了: {message}")

## スレッドの作成
thread1 = threading.Thread(target=print_message, args=("スレッド1",))
thread2 = threading.Thread(target=print_message, args=("スレッド2",))

## スレッドの開始
thread1.start()
thread2.start()

## スレッドの終了を待機
thread1.join()
thread2.join()

print("全スレッド終了")

Explanation of the Output

In this code, two threads start simultaneously and each runs independently. By using the join() method, the main thread can wait until all threads have finished.

Implementing Threads Using a Class

You can also implement more complex thread behavior by subclassing the Thread class.
import threading
import time

class MyThread(threading.Thread):
    def __init__(self, name):
        super().__init__()
        self.name = name

    def run(self):
        print(f"{self.name} 開始")
        time.sleep(2)
        print(f"{self.name} 終了")

## スレッドの作成
thread1 = MyThread("スレッド1")
thread2 = MyThread("スレッド2")

## スレッドの開始
thread1.start()
thread2.start()

## スレッドの終了を待機
thread1.join()
thread2.join()

print("全スレッド終了")

Explanation of the Output

Define the thread’s behavior in the run() method and start the thread with the start() method. This approach is useful when you want to reuse complex thread logic as a class.

Thread Synchronization and Locks

When multiple threads operate on shared data concurrently, race conditions and inconsistencies can occur. To prevent such issues, use a Lock object to synchronize between threads.

Example Using a Lock

import threading

lock = threading.Lock()
shared_resource = 0

def increment():
    global shared_resource
    with lock:  ## ロックを取得
        local_copy = shared_resource
        local_copy += 1
        shared_resource = local_copy

threads = []
for i in range(5):
    thread = threading.Thread(target=increment)
    threads.append(thread)
    thread.start()

for thread in threads:
    thread.join()

print(f"共有リソースの最終値: {shared_resource}")

Explanation of the Output

By using the with lock syntax, you can safely acquire and release locks. In this example, the lock is used to restrict access to the shared resource to one thread at a time.

Thread Timeouts and Daemon Threads

Thread Timeouts

By setting a timeout on the join() method, you can wait for a thread to finish for only a specified amount of time.
thread.join(timeout=5)

Daemon Threads

Daemon threads stop automatically when the main thread exits. To set a thread as a daemon, set the daemon attribute to True.
thread = threading.Thread(target=print_message)
thread.daemon = True
thread.start()

Practical Examples of Multithreading

The following is an example of parallelizing file downloads.
import threading
import time

def download_file(file_name):
    print(f"{file_name} のダウンロード開始")
    time.sleep(2)  ## ダウンロードをシミュレート
    print(f"{file_name} のダウンロード完了")

files = ["file1", "file2", "file3"]

threads = []
for file in files:
    thread = threading.Thread(target=download_file, args=(file,))
    threads.append(thread)
    thread.start()

for thread in threads:
    thread.join()

print("全ファイルのダウンロード完了")

Conclusion

This section explained basic multithreading implementations in Python as well as practical application examples. The next section will dive deeper into concrete use cases for multithreading.
侍エンジニア塾

5. Multithreading Use Cases

Python’s multithreading is particularly well-suited to I/O-bound tasks. This section presents several concrete examples of applying multithreading. Through these examples, you’ll learn how to use it in real-world projects.

1. Improving Web Scraping Efficiency

When collecting data from websites, sending requests to multiple URLs concurrently can significantly reduce processing time.

Sample code

Below is an example of web scraping using Python’s requests library and the threading module.
import threading
import requests
import time

urls = [
    "https://example.com/page1",
    "https://example.com/page2",
    "https://example.com/page3"
]

def fetch_url(url):
    print(f"{url} の取得開始")
    response = requests.get(url)
    print(f"{url} の取得完了: ステータスコード {response.status_code}")

threads = []
start_time = time.time()

for url in urls:
    thread = threading.Thread(target=fetch_url, args=(url,))
    threads.append(thread)
    thread.start()

for thread in threads:
    thread.join()

end_time = time.time()
print(f"処理時間: {end_time - start_time:.2f}秒")

Explanation of the results

In this code, requests to each URL are executed in parallel, reducing total processing time. However, when making many requests, be careful about server load and potential violations of site policies.

2. Concurrent File Downloads

When downloading multiple files from the internet, multithreading can handle the task more efficiently.

Sample code

import threading
import time

def download_file(file_name):
    print(f"{file_name} のダウンロード開始")
    time.sleep(2)  ## ダウンロードをシミュレート
    print(f"{file_name} のダウンロード完了")

files = ["file1.zip", "file2.zip", "file3.zip"]

threads = []
for file in files:
    thread = threading.Thread(target=download_file, args=(file,))
    threads.append(thread)
    thread.start()

for thread in threads:
    thread.join()

print("全ファイルのダウンロード完了")

Explanation of the results

In this code, each file’s download is executed per thread, reducing processing time. In real applications, you would use libraries like urllib or requests to implement real download functionality.

3. Parallel Execution of Database Queries

When retrieving large amounts of data from a database, using multithreading to execute queries in parallel can improve performance.

Sample code

import threading
import time

def query_database(query):
    print(f"クエリ実行中: {query}")
    time.sleep(2)  ## クエリ実行をシミュレート
    print(f"クエリ完了: {query}")

queries = ["SELECT * FROM users", "SELECT * FROM orders", "SELECT * FROM products"]

threads = []
for query in queries:
    thread = threading.Thread(target=query_database, args=(query,))
    threads.append(thread)
    thread.start()

for thread in threads:
    thread.join()

print("すべてのクエリが完了しました")

Explanation of the results

In this example, running different queries in parallel reduces data retrieval time. In real applications, you would connect using database libraries (e.g., sqlite3, psycopg2).

4. Parallelizing Video Processing

Tasks that process video files frame by frame can be made more efficient with multithreading.

Sample code

import threading
import time

def process_frame(frame_number):
    print(f"フレーム {frame_number} の処理開始")
    time.sleep(1)  ## 処理をシミュレート
    print(f"フレーム {frame_number} の処理完了")

frame_numbers = range(1, 6)

threads = []
for frame in frame_numbers:
    thread = threading.Thread(target=process_frame, args=(frame,))
    threads.append(thread)
    thread.start()

for thread in threads:
    thread.join()

print("すべてのフレーム処理が完了しました")

Explanation of the results

By parallelizing per-frame operations like video editing or effects processing, you can improve overall processing speed.

Conclusion

Multithreading is highly effective for systems that perform many I/O operations and applications that require real-time responsiveness. However, for CPU-intensive tasks you should consider the impact of the GIL and evaluate using multiprocessing appropriately.

6. Cautions and Best Practices for Using Multithreading

When using multithreading in Python, you can achieve efficient processing, but there are points to watch out for and common pitfalls. This section introduces multithreading challenges and best practices to avoid them.

Cautions

1. Impact of the Global Interpreter Lock (GIL)

Python’s GIL (Global Interpreter Lock) enforces the constraint that only one thread can execute Python bytecode at a time. Because of this, CPU-bound tasks (e.g., numerical computations) don’t benefit as much from multithreading.
  • Cases affected:
  • Heavy computational workloads
  • Algorithms that require high CPU usage
  • Mitigations:
  • Use the multiprocessing module to parallelize with multiple processes.
  • Use C extension modules or optimized libraries like NumPy to avoid the GIL.

2. Deadlocks

A deadlock, where multiple threads wait on resources held by each other, is a common issue in multithreaded programs. This can cause the entire program to halt.
  • Example: Thread A holds resource X and waits for resource Y, while Thread B holds resource Y and waits for resource X.
  • Mitigations:
  • Always acquire resources in a consistent order.
  • Use the RLock (reentrant lock) from the threading module to prevent deadlocks.
Sample code (deadlock avoidance)
import threading

lock1 = threading.Lock()
lock2 = threading.Lock()

def task1():
    with lock1:
        print("Task1がlock1を取得")
        with lock2:
            print("Task1がlock2を取得")

def task2():
    with lock2:
        print("Task2がlock2を取得")
        with lock1:
            print("Task2がlock1を取得")

thread1 = threading.Thread(target=task1)
thread2 = threading.Thread(target=task2)

thread1.start()
thread2.start()

thread1.join()
thread2.join()
print("両タスク完了")

3. Race conditions

When multiple threads operate on the same data simultaneously, unexpected behavior can occur. This is called a “race condition.”
  • Example: If two threads try to increment a counter variable at the same time, the counter may not increase as expected.
  • Mitigations:
  • Use the Lock from the threading module to synchronize access to shared data.
  • Minimize data sharing between threads.
Sample code (avoidance using locks)
import threading

lock = threading.Lock()
counter = 0

def increment():
    global counter
    with lock:
        local_copy = counter
        local_copy += 1
        counter = local_copy

threads = [threading.Thread(target=increment) for _ in range(100)]

for thread in threads:
    thread.start()

for thread in threads:
    thread.join()

print(f"カウンターの値: {counter}")

Best practices

1. Setting the appropriate number of threads

  • When setting the number of threads, consider the number of CPU cores and I/O wait times.
  • Recommendation: For I/O-bound tasks, it’s usually fine to increase the thread count, but for CPU-intensive tasks it’s common to limit threads to the number of cores.

2. Debugging and logging

  • Multithreaded programs are harder to debug, so proper logging is important.
  • Recommendation: Use Python’s logging module to record logs per thread.
Sample code (logging)
import threading
import logging

logging.basicConfig(level=logging.DEBUG, format='%(threadName)s: %(message)s')

def task():
    logging.debug("タスク実行中")

threads = [threading.Thread(target=task) for _ in range(5)]

for thread in threads:
    thread.start()

for thread in threads:
    thread.join()

logging.debug("すべてのタスクが完了")

3. Using high-level libraries

Using high-level libraries like concurrent.futures.ThreadPoolExecutor makes thread management easier.
Sample code (ThreadPoolExecutor)
from concurrent.futures import ThreadPoolExecutor

def task(name):
    print(f"{name} 実行中")

with ThreadPoolExecutor(max_workers=3) as executor:
    executor.map(task, ["タスク1", "タスク2", "タスク3"])

Conclusion

To effectively use multithreading in Python, it’s important to pay attention to the GIL and synchronization issues and aim for a safe, efficient design. Appropriate use of locks, debugging techniques, and leveraging high-level libraries when needed are key to building successful multithreaded programs.

7. Comparison of Multithreading and Multiprocessing

In Python, there are two main approaches to achieving parallelism: multithreading and multiprocessing. Each has its own characteristics and is suited to different situations. This section compares the two in detail and offers guidance on when to use each.

Basic Differences Between Multithreading and Multiprocessing

FeatureMultithreadingMultiprocessing
Execution unitMultiple threads within the same processMultiple independent processes
Memory spaceShared (use the same memory space)Independent (isolated memory space per process)
LightweightLightweight and fast to startHeavy and slower to start
GIL impactAffectedNot affected
Data sharingEasy (uses the same memory)Complex (requires inter-process communication)
Use casesI/O-bound tasksCPU-bound tasks

Detailed Explanation

  • Multithreading: Because multiple threads run within the same process, it’s lightweight and data sharing is easy. However, in Python, the GIL can limit performance for CPU-intensive tasks.
  • Multiprocessing: Since processes do not share memory space, it’s not affected by the GIL and can fully utilize multiple CPU cores. However, if inter-process communication (IPC) is required, the implementation can be somewhat more complex.

When to Choose Multithreading

  • Examples:
  • Web scraping
  • File operations (read/write)
  • Network communication (asynchronous operations)
  • Reason: Multithreading can efficiently utilize I/O waiting time, increasing parallelism. Also, because threads share the same memory space, exchanging data is easy.

Code example: I/O-bound tasks

import threading
import time

def file_operation(file_name):
    print(f"{file_name} 処理開始")
    time.sleep(2)  ## ファイル操作をシミュレート
    print(f"{file_name} 処理完了")

files = ["file1.txt", "file2.txt", "file3.txt"]

threads = []
for file in files:
    thread = threading.Thread(target=file_operation, args=(file,))
    threads.append(thread)
    thread.start()

for thread in threads:
    thread.join()

print("すべてのファイル操作が完了しました")

When to Choose Multiprocessing

  • Examples:
  • Large-scale data processing
  • Training machine learning models
  • Image processing and numerical computations
  • Reason: You can avoid the GIL and fully utilize multiple CPU cores to achieve high computational performance. However, sharing data between processes can be cumbersome.

Code example: CPU-intensive tasks

from multiprocessing import Process
import time

def compute_heavy_task(task_id):
    print(f"タスク {task_id} 実行中")
    time.sleep(3)  ## 計算処理をシミュレート
    print(f"タスク {task_id} 完了")

tasks = ["計算1", "計算2", "計算3"]

processes = []
for task in tasks:
    process = Process(target=compute_heavy_task, args=(task,))
    processes.append(process)
    process.start()

for process in processes:
    process.join()

print("すべての計算タスクが完了しました")

Combining Both

In some projects, combining multithreading and multiprocessing can yield optimal performance. For example, you might parallelize data retrieval (I/O) with multithreading and then process that data with CPU-intensive computations using multiprocessing.

Criteria for Choosing Between Multithreading and Multiprocessing

Consider the following points when choosing.
  1. Nature of the task:
  • If there’s a lot of I/O waiting: Multithreading
  • For compute-heavy tasks: Multiprocessing
  1. Resource constraints:
  • If you want to minimize memory usage: Multithreading
  • If you want to fully utilize CPU cores: Multiprocessing
  1. Code complexity:
  • If you want to share data easily: Multithreading
  • If you can handle inter-process communication: Multiprocessing

8. Summary and FAQ

This article provided a detailed explanation of using multithreading and multiprocessing in Python, covering basic concepts, implementation examples, caveats, and guidance on when to use each. In this section, we summarize the key points of the article and supplement the explanation with an FAQ format to answer questions readers are likely to have.

Key takeaways

  1. Characteristics of multithreading
  • Well-suited for reducing I/O wait time and makes sharing data easy.
  • Affected by the GIL, so not suitable for CPU-bound tasks.
  1. Characteristics of multiprocessing
  • Not constrained by the GIL and performs well for CPU-intensive workloads.
  • Uses separate memory spaces, so inter-process communication may be required.
  1. Choosing the right approach is key
  • Use multithreading for I/O-bound tasks and multiprocessing for CPU-bound tasks.
  • Combining both when appropriate can yield optimal performance.

FAQ (Frequently Asked Questions)

Q1: When using multithreading, how many threads should I use?

A: Consider the following when setting the number of threads.
  • I/O-bound tasks: You can use many threads without problems. Specifically, it’s common to match the number of threads to the number of tasks you want to process concurrently.
  • CPU-bound tasks: Keep the number of threads at or below the number of physical cores. Too many threads can lead to performance degradation due to the GIL.

Q2: Is there a way to completely avoid the constraints of the GIL?

A: Yes, you can avoid the GIL’s effects using the methods below.
  • Use multiprocessing:multiprocessing allows you to avoid the GIL by performing process-level parallelism.
  • Use external libraries: Libraries implemented in C, such as NumPy and Pandas, can temporarily release the GIL and operate very efficiently.

Q3: How do multithreading and asynchronous programming (asyncio) differ?

A:
  • Multithreading: Uses threads to execute tasks in parallel. Because threads share resources, synchronization may be necessary.
  • Asynchronous programming: Uses asyncio to switch between tasks within an event loop. Runs within a single thread, avoiding thread contention and locking issues. It’s specialized for I/O waiting, so it’s lighter-weight than threads.

Q4: What are the benefits of using a thread pool in Python?

A: Using a thread pool makes creating and tearing down threads more efficient. It’s especially useful when handling a large number of tasks. Using concurrent.futures.ThreadPoolExecutor makes thread management easier. Example:
from concurrent.futures import ThreadPoolExecutor

def task(name):
    print(f"{name} 実行中")

with ThreadPoolExecutor(max_workers=5) as executor:
    executor.map(task, ["タスク1", "タスク2", "タスク3", "タスク4", "タスク5"])

Q5: Does using multithreading increase memory consumption?

A: Because threads share the same memory space, memory usage does not simply increase in direct proportion to the number of threads. However, each thread is allocated stack memory, so creating a large number of threads will increase overall memory usage.

Conclusion

Multithreading and multiprocessing are important techniques for improving the performance of Python programs. Use the information in this article to leverage the strengths of each and achieve efficient parallel processing. With proper choices and design, you can further expand what your Python programs can do.
侍エンジニア塾