Python Multiprocessing: Guide to Using, Optimizing & Debugging

目次

1. Basics: What is Python Multiprocessing?

1.1 What is Multiprocessing?

Multiprocessing is a technology that runs multiple processes (independent execution units) simultaneously. In Python, you can easily implement multiprocessing using the multiprocessing module.

Features of Multiprocessing

  • Each process has its own independent memory space
  • Can fully utilize CPU cores
  • Inter-process communication is required (using Queue or Pipe)

Typical Use Cases

  • Compute-intensive tasks (machine learning, numerical simulation)
  • Tasks that fully utilize CPU (image processing, data analysis)

1.2 Difference from Multithreading

Python also has a parallel processing mechanism called “multithreading”. How do multiprocessing and multithreading differ?
ItemMultiprocessingMultithreading
Memory sharingNo (independent processes)Yes (within the same process)
Effect of GILNot affectedAffected
CPU-bound suitability
I/O-bound suitability
Data exchangeRequires Queue or PipeCan use shared memory

What is GIL (Global Interpreter Lock)?

The standard Python interpreter (CPython) has a mechanism called the GIL, which means that even when using multithreading, only one thread can execute at a time. Therefore, if you want to fully utilize the CPU, multiprocessing is appropriate.

1.3 Simple Multiprocessing Example in Python

import multiprocessing
import time

def worker(n):
    print(f"Process {n} start")
    time.sleep(2)
    print(f"Process {n} finished")

if __name__ == "__main__":
    process_list = []

    # Create 3 processes
    for i in range(3):
        p = multiprocessing.Process(target=worker, args=(i,))
        process_list.append(p)
        p.start()

    # Wait until all processes finish
    for p in process_list:
        p.join()

    print("All processes finished")

1.4 Precautions When Using Multiprocessing

1. The if __name__ == "__main__": guard is required on Windows

On Windows, if you use multiprocessing.Process() without writing if __name__ == "__main__":, an error will occur.
Incorrect code (causes error)
import multiprocessing

def worker():
    print("Hello from process")

p = multiprocessing.Process(target=worker)
p.start()
This code will raise an error on Windows.
Correct code
import multiprocessing

def worker():
    print("Hello from process")

if __name__ == "__main__":
    p = multiprocessing.Process(target=worker)
    p.start()
    p.join()
Adding if __name__ == "__main__": allows it to run correctly on Windows.

1.5 Summary

  • What is multiprocessing?A method to run multiple processes in parallel
  • Difference from multithreadingNot affected by the GIL and suitable for CPU-bound tasks
  • Simple example in Python → Use multiprocessing.Process()
  • Precautions on Windowsif __name__ == "__main__": is required

2. Practical Guide: Using the multiprocessing Module

2.1 Overview of the multiprocessing Module

multiprocessing module is the standard library in Python for process-based parallel processing. By using this module, you can fully utilize CPU cores and bypass the GIL’s limitations.

Key Features of multiprocessing

FeatureDescription
ProcessCreate and run individual processes
QueueSend and receive data between processes
PipeExchange data between two processes
Value & ArrayUse shared memory between processes
PoolCreate a pool of processes to perform parallel processing efficiently

2.2 Basic Usage of the Process Class

To create a new process in Python, use the multiprocessing.Process class.

Creating a Basic Process

import multiprocessing
import time

def worker(n):
    print(f"Process {n} start")
    time.sleep(2)
    print(f"Process {n} finished")

if __name__ == "__main__":
    p1 = multiprocessing.Process(target=worker, args=(1,))
    p2 = multiprocessing.Process(target=worker, args=(2,))

    p1.start()
    p2.start()

    p1.join()
    p2.join()

    print("All processes finished")

2.3 Interprocess Communication (Queue & Pipe)

Sending and Receiving Data Using Queue

import multiprocessing

def worker(q):
    q.put("Hello from child process")

if __name__ == "__main__":
    q = multiprocessing.Queue()
    p = multiprocessing.Process(target=worker, args=(q,))
    p.start()
    p.join()

    # Retrieve data from child process
    print(q.get())

2.4 Shared Memory Using Value and Array

import multiprocessing

def worker(val, arr):
    val.value = 3.14  # Change the value in shared memory
    arr[0] = 42       # Change the array value

if __name__ == "__main__":
    val = multiprocessing.Value('d', 0.0)  # 'd' is double type
    arr = multiprocessing.Array('i', [0, 1, 2])  # 'i' is integer type

    p = multiprocessing.Process(target=worker, args=(val, arr))
    p.start()
    p.join()

    print(f"val: {val.value}, arr: {arr[:]}")

2.5 Process Management Using the Pool Class

Parallel Processing Using Pool

import multiprocessing

def square(n):
    return n * n

if __name__ == "__main__":
    with multiprocessing.Pool(4) as pool:
        results = pool.map(square, range(10))

    print(results)

2.6 Summary

  • multiprocessing module makes parallel processing easy to implement
  • Create individual processes with the Process class
  • Using Queue or Pipe enables data sharing between processes
  • Value and Array provide shared memory
  • Using the Pool class allows efficient processing of large amounts of data

3. Advanced: Error Handling and Performance Optimization

3.1 Common Errors in multiprocessing and Solutions

Error 1: Missing if __name__ == "__main__": error on Windows

Error Message
RuntimeError: freeze_support() must be called if program is run in frozen mode
Solution
import multiprocessing

def worker():
    print("Hello from process")

if __name__ == "__main__":  # This is required
    p = multiprocessing.Process(target=worker)
    p.start()
    p.join()

Error 2: PicklingError (cannot pass functions between processes)

Error Message
AttributeError: Can't pickle local object 'main..'
Solution
import multiprocessing

def square(x):  # make it a global function
    return x * x

if __name__ == "__main__":
    with multiprocessing.Pool(4) as pool:
        results = pool.map(square, range(10))  # avoid lambda
    print(results)

Error 3: Deadlock (process remains stopped)

Solution
import multiprocessing

def worker(q):
    q.put("data")

if __name__ == "__main__":
    q = multiprocessing.Queue()
    p = multiprocessing.Process(target=worker, args=(q,))
    p.start()
    print(q.get())  # receive data
    p.join()  # normal termination here

3.2 Performance Optimization Techniques

Optimization 1: Set the number of processes appropriately

import multiprocessing

def worker(n):
    return n * n

if __name__ == "__main__":
    num_workers = multiprocessing.cpu_count()  # get number of CPU cores
    with multiprocessing.Pool(num_workers) as pool:
        results = pool.map(worker, range(100))
    print(results)

Optimization 2: Use Pool.starmap()

import multiprocessing

def multiply(a, b):
    return a * b

if __name__ == "__main__":
    with multiprocessing.Pool(4) as pool:
        results = pool.starmap(multiply, [(1, 2), (3, 4), (5, 6)])
    print(results)

Optimization 3: Leverage shared memory

import multiprocessing
import ctypes

def worker(shared_array):
    shared_array[0] = 99  # modify the value in shared memory

if __name__ == "__main__":
    shared_array = multiprocessing.Array(ctypes.c_int, [1, 2, 3])  # create shared memory
    p = multiprocessing.Process(target=worker, args=(shared_array,))
    p.start()
    p.join()
    print(shared_array[:])  # [99, 2, 3]

3.3 Summary

  • multiprocessing common error avoidance methods explained
  • Performance optimization points:
  • Set the number of processes appropriately
  • Leverage starmap()
  • Speed up with shared memory

4. FAQ: Common Questions and Solutions

4.1 Which should you use in Python, multiprocessing or multithreading?

Answer

  • CPU-bound (computationally intensive tasks)Multiprocessing (multiprocessing)
  • I/O-bound (file/network operations)Multithreading (threading)
Type of TaskSuitable Parallelism
CPU-bound (numerical calculations, image processing, etc.)Multiprocessing (multiprocessing)
I/O-bound (file or API requests, etc.)Multithreading (threading)

4.2 Why does multiprocessing feel “slow”?

Answer

  • High cost of creating processes → Use Pool
  • Too much data copying → Use shared memory (Value, Array)
  • Processing many small tasks → Try concurrent.futures.ThreadPoolExecutor
import multiprocessing

def worker(n):
    return n * n

if __name__ == "__main__":
    with multiprocessing.Pool(multiprocessing.cpu_count()) as pool:
        results = pool.map(worker, range(100))
    print(results)

4.3 How to share dictionaries and lists in multiprocessing?

Answer

Use multiprocessing.Manager().
import multiprocessing

def worker(shared_list):
    shared_list.append(100)  # Update shared list

if __name__ == "__main__":
    with multiprocessing.Manager() as manager:
        shared_list = manager.list([1, 2, 3])
        p = multiprocessing.Process(target=worker, args=(shared_list,))
        p.start()
        p.join()
        print(shared_list)  # [1, 2, 3, 100]

4.4 Common errors with multiprocessing.Pool and how to address them?

ErrorCauseSolution
AttributeError: Can't pickle local objectPassing a lambda or local functionUse a global function
RuntimeError: freeze_support() must be calledMissing if __name__ == "__main__": on WindowsAdd if __name__ == "__main__":
EOFError: Ran out of inputProcesses did not terminate properly in PoolCall pool.close() and pool.join() appropriately

4.5 How to debug Python multiprocessing?

Answer

Use multiprocessing.log_to_stderr().
import multiprocessing
import logging

def worker(n):
    logger = multiprocessing.get_logger()
    logger.info(f"Process {n} running")

if __name__ == "__main__":
    multiprocessing.log_to_stderr(logging.INFO)  # Enable logging
    p = multiprocessing.Process(target=worker, args=(1,))
    p.start()
    p.join()
RUNTEQ(ランテック)|超実戦型エンジニア育成スクール

5. Summary and Additional Learning Resources

5.1 Summary of This Article

Fundamentals of Multiprocessing

  • What is multiprocessing?A technique that runs multiple processes in parallel to maximize CPU utilization
  • Difference from multithreading
  • Multiprocessing → Suited for CPU‑bound tasks (numerical calculations, image processing, etc.)
  • Multithreading → Suited for I/O‑bound tasks (file handling, network communication, etc.)

How to Use the multiprocessing Module

  • Process class to create individual processes
  • Using Queue and Pipe to send and receive data between processes
  • Leveraging Value and Array to utilize shared memory
  • With the Pool class to execute parallel processing efficiently

Error Handling and Performance Optimization

  • Common errors
  • If you omit if __name__ == "__main__":, you get errors on Windows
  • lambda functions and local functions cause PicklingError
  • Forgetting to call Queue.get() leads to deadlocks
  • Performance optimization
  • Use Pool to reduce the overhead of creating processes
  • Use starmap() to pass multiple arguments
  • Utilize multiprocessing.shared_memory to reduce data copy overhead

5.2 Additional Learning Resources

1. Python Official Documentation

2. Online Tutorials

5.3 Looking Ahead to Future Applications

By properly leveraging Python’s multiprocessing, you can use the CPU efficiently and create high‑performance programs.

Technologies to Learn Next

  • Asynchronous processing (asyncio) → Parallelize I/O‑bound tasks
  • concurrent.futures → Unified management of threads and processes

5.4 Conclusion

In this article, we explained “Python multiprocessing” in detail from basics to practice and advanced applications. Apply the knowledge you learned in this article to real projects and give it a try! 🚀