目次
- 1 1. Introduction
- 2 2. Basic Concepts of Threads
- 3 3. Creating Threads in Python
- 4 4. Synchronizing Data Between Threads
- 5 5. GIL and Thread Limitations
- 6 6. Practical Examples: Programs Using Threads
- 7 7. Best Practices for Using Threads
- 8 8. Conclusion
1. Introduction
Python is a programming language loved by many developers for its simplicity and flexibility. Among its powerful features, threading is one of the essential techniques for efficient program design. In this article, we will explain threading in Python clearly, from the basics to advanced usage.What is a Thread?
A thread is a small unit of execution that runs independently within a program. By running multiple threads inside a single process, tasks can be executed concurrently. This mechanism improves program speed and enables efficient use of resources.Why Learn Threading in Python?
By leveraging threads, you can effectively solve the following problems:- Efficient Handling of I/O Waits Tasks with heavy I/O operations such as file handling or network communication can shorten waiting time using threads.
- Simultaneous Execution of Multiple Tasks For example, handling large amounts of data at once or sending multiple API requests concurrently.
- Improved User Experience In GUI applications, running background tasks keeps the interface responsive.
What You Will Learn in This Article
This article covers the following about Python threading:- Basic concepts of threads and how to use them
- How to prevent data races between threads
- The mechanism and impact of the GIL (Global Interpreter Lock)
- How to use threads in real programs
- Best practices and precautions
2. Basic Concepts of Threads
Threads are a fundamental mechanism for implementing concurrency within a program. This section covers the basics of threads and explains how they differ from processes and parallel processing.What is a Thread?
A thread is a single unit of execution that runs independently within a program. Typically, a program runs as a process, which may contain one or more threads. For example, in a web browser, multiple threads run concurrently such as:- Monitoring user input
- Rendering web pages
- Streaming video playback
Differences Between Processes and Threads
To fully understand threads, it is important to first distinguish them from processes.Item | Process | Thread |
---|---|---|
Memory Space | Independent | Shared within a process |
Creation Cost | High (each process allocates its own memory) | Low (memory is shared, making it more efficient) |
Communication Method | Requires IPC (Inter-Process Communication) | Can share data directly |
Granularity of Concurrency | Large | Small |
Differences Between Concurrency and Parallelism
When learning about “threads,” it is crucial to understand the difference between concurrency and parallelism.- Concurrency: Tasks are executed in small alternating steps, giving the appearance of simultaneous execution. Python threads are suitable for concurrency. Example: One cashier serving multiple customers in turn.
- Parallelism: Multiple tasks are executed physically at the same time. This requires multiple CPU cores and is usually handled by multiprocessing in Python. Example: Several cashiers serving different customers simultaneously.
Characteristics of Threads in Python
Python provides thethreading
module as part of its standard library, allowing threads to be created and managed easily. However, Python threads have the following features and limitations:- Global Interpreter Lock (GIL) The GIL ensures that only one thread can execute Python bytecode at a time. Because of this, threads are limited in their effectiveness for CPU-bound tasks (computations requiring heavy CPU usage).
- Best for I/O-Bound Tasks Threads are optimal for tasks such as network communication or file I/O, where waiting times dominate execution.
Practical Use Cases for Threads
Here are a few common scenarios where threads are useful:- Web Scraping: Retrieve multiple web pages concurrently.
- Database Access: Handle multiple client requests asynchronously.
- Background Tasks: Perform heavy processing in threads while keeping the main thread responsive to user input.
3. Creating Threads in Python
In Python, thethreading
module makes it easy to create threads and implement concurrency. This section explains basic thread creation and operations.Overview of the threading Module
Thethreading
module is the standard library for creating and managing threads in Python. With it, you can:- Create and start threads
- Synchronize between threads
- Manage thread states
threading
module allows simple and flexible operations.Basic Method for Creating Threads
The most common way to create a thread is by using theThread
class. Below is a basic example of creating and running a thread:import threading
import time
# Function to run in the thread
def print_numbers():
for i in range(5):
print(f"Number: {i}")
time.sleep(1)
# Create a thread
thread = threading.Thread(target=print_numbers)
# Start the thread
thread.start()
# Main thread processing
print("Main thread is running...")
# Wait for the thread to finish
thread.join()
print("Thread has completed.")
Key Points of the Code
- Creating a Thread:
Specify the function to run in the thread using the
target
argument of thethreading.Thread
class. - Starting the Thread:
Call the
start()
method to begin execution. - Waiting for Completion:
Use the
join()
method to block the main thread until the specified thread finishes execution.
print_numbers
function runs in a separate thread, while the main thread continues independently.Passing Arguments to Threads
If you need to pass parameters to a thread, you can use theargs
argument. Here’s an example:def print_numbers_with_delay(delay):
for i in range(5):
print(f"Number: {i}")
time.sleep(delay)
# Create a thread with arguments
thread = threading.Thread(target=print_numbers_with_delay, args=(2,))
thread.start()
thread.join()
Key Points
- Arguments are passed in tuple form, such as
args=(2,)
. - In the example above, the value
2
is passed todelay
, causing a 2-second pause between each loop iteration.
Creating Threads Using Classes
For more advanced thread operations, you can extend theThread
class and create your own custom thread class.class CustomThread(threading.Thread):
def __init__(self, name):
super().__init__()
self.name = name
def run(self):
for i in range(5):
print(f"{self.name} is running: {i}")
time.sleep(1)
# Create thread instances
thread1 = CustomThread(name="Thread 1")
thread2 = CustomThread(name="Thread 2")
# Start threads
thread1.start()
thread2.start()
# Wait for threads to finish
thread1.join()
thread2.join()
print("All threads have completed.")
Key Points of the Code
- The
run
Method: Override therun
method of theThread
class to define the behavior of the thread. - Named Threads: Assigning names to threads makes debugging and log identification easier.
Managing Thread States
The following methods are useful when managing the state of threads:is_alive()
: Checks if a thread is currently running.setDaemon(True)
: Sets the thread as a daemon (background) thread.
Example of a Daemon Thread
def background_task():
while True:
print("Background task is running...")
time.sleep(2)
# Create a daemon thread
thread = threading.Thread(target=background_task)
thread.setDaemon(True) # Set as daemon
thread.start()
print("Main thread is exiting.")
# The daemon thread will automatically exit when the main thread finishes
Daemon threads automatically terminate when the main thread exits. This property is useful for implementing background tasks.4. Synchronizing Data Between Threads
When using threads in Python, conflicts can occur if multiple threads access the same resource simultaneously. This section explains synchronization methods to prevent data races.What is a Data Race?
A data race occurs when multiple threads simultaneously modify the same resource (such as a variable or file). This can lead to unintended results or program errors.Example of a Data Race
import threading
counter = 0
def increment():
global counter
for _ in range(1000000):
counter += 1
# Create two threads
thread1 = threading.Thread(target=increment)
thread2 = threading.Thread(target=increment)
thread1.start()
thread2.start()
thread1.join()
thread2.join()
print(f"Counter value: {counter}")
In this example, two threads update the counter
variable simultaneously. Due to the race condition, the final value may not match the expected result (2,000,000).Synchronization with Locks
To prevent data races, you can use theLock
object from the threading
module to synchronize between threads.Basic Usage of Locks
import threading
counter = 0
lock = threading.Lock()
def increment_with_lock():
global counter
for _ in range(1000000):
# Acquire the lock before updating
with lock:
counter += 1
# Create two threads
thread1 = threading.Thread(target=increment_with_lock)
thread2 = threading.Thread(target=increment_with_lock)
thread1.start()
thread2.start()
thread1.join()
thread2.join()
print(f"Counter value with lock: {counter}")
Key Points
with lock
Syntax: Usingwith
simplifies acquiring and releasing locks.- Lock Acquisition and Release: Once a lock is acquired, other threads must wait until it is released.
counter
reaches the intended value (2,000,000).Recursive Lock (RLock)
WhileLock
is a simple locking mechanism, sometimes a thread needs to acquire the same lock multiple times. In such cases, use RLock
(Recursive Lock).Example of RLock
import threading
lock = threading.RLock()
def nested_function():
with lock:
print("First level lock acquired")
with lock:
print("Second level lock acquired")
thread = threading.Thread(target=nested_function)
thread.start()
thread.join()
Key Points
RLock
allows the same thread to acquire the lock multiple times.- Useful in cases where nested lock management is required.
Synchronization with Semaphores
threading.Semaphore
is used to limit the number of threads that can access a resource simultaneously.Example of a Semaphore
import threading
import time
semaphore = threading.Semaphore(2)
def access_resource(name):
with semaphore:
print(f"{name} is accessing the resource")
time.sleep(2)
print(f"{name} has released the resource")
threads = []
for i in range(5):
thread = threading.Thread(target=access_resource, args=(f"Thread-{i}",))
threads.append(thread)
thread.start()
for thread in threads:
thread.join()
Key Points
- A semaphore restricts the number of threads that can access a resource at the same time.
- In this example, a maximum of two threads can access the resource concurrently.
Synchronization with Events
threading.Event
is used for signaling between threads.Example of an Event
import threading
import time
event = threading.Event()
def wait_for_event():
print("Thread is waiting for event...")
event.wait()
print("Event has been set. Proceeding with task.")
def set_event():
time.sleep(2)
print("Setting event")
event.set()
thread1 = threading.Thread(target=wait_for_event)
thread2 = threading.Thread(target=set_event)
thread1.start()
thread2.start()
thread1.join()
thread2.join()
Key Points
wait()
blocks a thread until the event is set.set()
signals the event, resuming all waiting threads.
Summary
Choosing the right synchronization method is key to preventing data races:- Use
Lock
for simple synchronization - Use
RLock
for nested lock scenarios - Use
Semaphore
to limit concurrent thread access - Use
Event
for signaling between threads
5. GIL and Thread Limitations
When working with threads in Python, one unavoidable topic is the Global Interpreter Lock (GIL). Understanding how the GIL works and its limitations is essential for using threads effectively.What is the GIL?
The Global Interpreter Lock (GIL) is a locking mechanism used internally by the Python interpreter (specifically CPython). It ensures that only one thread executes Python bytecode at a time.The Role of the GIL
- Introduced to guarantee safe memory management.
- Mainly ensures consistency of Python objects, especially reference counting.
Example of GIL in Action
The following example runs two CPU-intensive tasks using threads:import threading
import time
def cpu_bound_task():
start = time.time()
count = 0
for _ in range(10**7):
count += 1
print(f"Task completed in: {time.time() - start:.2f} seconds")
# Create two threads
thread1 = threading.Thread(target=cpu_bound_task)
thread2 = threading.Thread(target=cpu_bound_task)
start_time = time.time()
thread1.start()
thread2.start()
thread1.join()
thread2.join()
print(f"Total time: {time.time() - start_time:.2f} seconds")
Result Analysis
Even though two threads are created, the execution time does not become twice as fast. This is because the GIL prevents threads from running CPU-bound code simultaneously.Situations Where the GIL Matters
- CPU-Bound Tasks: Tasks such as numerical computation or image processing are heavily limited by the GIL, offering little benefit from multithreading.
- I/O-Bound Tasks: Tasks such as file handling or network communication involve significant waiting time, so the GIL has little impact. Threads can still provide performance benefits here.
Ways to Overcome GIL Limitations
1. Use Multiprocessing
A common way to bypass the GIL is by using themultiprocessing
module, which creates multiple independent processes that each have their own memory space.from multiprocessing import Process
import time
def cpu_bound_task():
start = time.time()
count = 0
for _ in range(10**7):
count += 1
print(f"Task completed in: {time.time() - start:.2f} seconds")
# Create two processes
process1 = Process(target=cpu_bound_task)
process2 = Process(target=cpu_bound_task)
start_time = time.time()
process1.start()
process2.start()
process1.join()
process2.join()
print(f"Total time: {time.time() - start_time:.2f} seconds")
Key Points
- Each process has its own memory space, so the GIL does not apply.
- For CPU-bound tasks, multiprocessing is more efficient than threading.
2. Use C Extensions
Some Python C extensions (such as NumPy and Pandas) release the GIL internally to perform computations in parallel. This can significantly improve performance in CPU-bound workloads.- NumPy can accelerate numerical operations.
- Cython or Numba can compile Python code for optimization.
3. Use asyncio
For I/O-bound tasks,asyncio
can be used instead of threads to achieve efficient concurrency in a single thread.import asyncio
async def io_bound_task(name, delay):
print(f"{name} started")
await asyncio.sleep(delay)
print(f"{name} completed")
async def main():
await asyncio.gather(
io_bound_task("Task 1", 2),
io_bound_task("Task 2", 3)
)
asyncio.run(main())
Key Points
- asyncio avoids the GIL’s limitations since it uses cooperative multitasking.
- Best suited for network communication and file I/O operations.
Pros and Cons of the GIL
Advantages- Simplifies Python memory management.
- Improves data safety in single-threaded environments.
- Limits performance improvements from multithreading.
- Requires multiprocessing for CPU-bound tasks.
Summary
The GIL is a major limitation of threading in Python, but by understanding its impact, you can choose the appropriate approach:- For CPU-bound tasks: use multiprocessing or C extensions.
- For I/O-bound tasks: use threads or asyncio.
6. Practical Examples: Programs Using Threads
Threads, when used appropriately, can handle complex tasks efficiently. This section introduces concrete examples of programs that use Python threads.1. Concurrent Web Scraping
In web scraping, using threads to fetch multiple pages at once can significantly reduce processing time.import threading
import requests
def fetch_url(url):
response = requests.get(url)
print(f"Fetched {url}: {len(response.content)} bytes")
urls = [
"https://example.com",
"https://httpbin.org",
"https://www.python.org",
]
threads = []
# Create a thread for each URL
for url in urls:
thread = threading.Thread(target=fetch_url, args=(url,))
threads.append(thread)
thread.start()
# Wait for all threads to finish
for thread in threads:
thread.join()
print("All URLs fetched.")
Key Points
- Multiple URLs are fetched concurrently using threads.
- The
requests
library makes sending HTTP requests simple.
2. Simultaneous File Read/Write
Threads can be used to efficiently read and write large numbers of files at the same time.import threading
def write_to_file(filename, content):
with open(filename, 'w') as f:
f.write(content)
print(f"Wrote to {filename}")
files = [
("file1.txt", "Content for file 1"),
("file2.txt", "Content for file 2"),
("file3.txt", "Content for file 3"),
]
threads = []
# Create a thread for each file
for filename, content in files:
thread = threading.Thread(target=write_to_file, args=(filename, content))
threads.append(thread)
thread.start()
# Wait for all threads to finish
for thread in threads:
thread.join()
print("All files written.")
Key Points
- Each thread independently writes to a separate file, speeding up the process.
- Effective when multiple threads access different resources simultaneously.
3. Background Processing in GUI Applications
In GUI applications, threads are often used to run heavy tasks in the background while the main thread handles user interactions. Here’s a simple example usingtkinter
:import threading
import time
from tkinter import Tk, Button, Label
def long_task(label):
label.config(text="Task started...")
time.sleep(5) # Simulating a long task
label.config(text="Task completed!")
def start_task(label):
thread = threading.Thread(target=long_task, args=(label,))
thread.start()
# Setup GUI
root = Tk()
root.title("Threaded GUI Example")
label = Label(root, text="Click the button to start the task.")
label.pack(pady=10)
button = Button(root, text="Start Task", command=lambda: start_task(label))
button.pack(pady=10)
root.mainloop()
Key Points
- Threads prevent the UI from freezing during long-running tasks.
- Use
threading.Thread
for asynchronous task execution.
4. Real-Time Data Processing
When processing real-time data such as sensor readings or log streams, threads allow concurrent handling.import threading
import time
import random
def process_data(sensor_name):
for _ in range(5):
data = random.randint(0, 100)
print(f"{sensor_name} read data: {data}")
time.sleep(1)
sensors = ["Sensor-1", "Sensor-2", "Sensor-3"]
threads = []
# Create a thread for each sensor
for sensor in sensors:
thread = threading.Thread(target=process_data, args=(sensor,))
threads.append(thread)
thread.start()
# Wait for all threads to finish
for thread in threads:
thread.join()
print("All sensor data processed.")
Key Points
- Each thread processes data from an independent sensor.
- Suitable for real-time data collection and analysis.
Summary
Through these examples, we learned how to design efficient programs with Python threads:- Web Scraping: Parallel data collection
- File Operations: Faster simultaneous read/write
- GUI Applications: Responsive background processing
- Real-Time Data: Concurrent handling of live input
7. Best Practices for Using Threads
Threads are a powerful tool for efficient concurrency, but if used incorrectly, they can cause issues such as deadlocks and data races. This section introduces the best practices you should follow when using threads in Python.1. Avoiding Deadlocks
A deadlock occurs when multiple threads wait indefinitely for each other’s locks. To avoid this, it is important to standardize the order and method of acquiring locks.Example of a Deadlock
import threading
import time
lock1 = threading.Lock()
lock2 = threading.Lock()
def thread1_task():
with lock1:
print("Thread 1 acquired lock1")
time.sleep(1)
with lock2:
print("Thread 1 acquired lock2")
def thread2_task():
with lock2:
print("Thread 2 acquired lock2")
time.sleep(1)
with lock1:
print("Thread 2 acquired lock1")
thread1 = threading.Thread(target=thread1_task)
thread2 = threading.Thread(target=thread2_task)
thread1.start()
thread2.start()
thread1.join()
thread2.join()
This code results in a deadlock because lock1
and lock2
are each waiting to be released.Solutions
- Standardize Lock Order: Always acquire locks in the same order across all threads.
- Set Timeouts: Specify a timeout when acquiring a lock so that threads don’t wait indefinitely.
lock1.acquire(timeout=1)
2. Optimizing the Number of Threads
Creating too many threads can introduce overhead and reduce performance. Choose the optimal number of threads based on the type of task.General Guidelines
- I/O-bound tasks: Set a higher number of threads (e.g., 2× the CPU core count or more).
- CPU-bound tasks: Set the number of threads equal to or fewer than the CPU core count.
3. Safely Stopping Threads
Safely terminating threads is crucial for program stability. Thethreading
module does not provide a way to forcibly stop threads, so you need to manage termination conditions inside the thread.Example of Safe Thread Termination
import threading
import time
class SafeThread(threading.Thread):
def __init__(self):
super().__init__()
self._stop_event = threading.Event()
def run(self):
while not self._stop_event.is_set():
print("Thread is running...")
time.sleep(1)
def stop(self):
self._stop_event.set()
thread = SafeThread()
thread.start()
time.sleep(5)
thread.stop()
thread.join()
print("Thread has been safely stopped.")
Key Points
- Use flags or events inside the thread to monitor termination conditions.
- Call
stop()
explicitly to set a stop condition.
4. Using Logging for Debugging
To trace thread execution, use thelogging
module instead of print
. This provides more detailed information including thread names and timestamps.Logging Example
import threading
import logging
logging.basicConfig(level=logging.DEBUG, format='%(threadName)s: %(message)s')
def task():
logging.debug("Task started")
logging.debug("Task completed")
thread = threading.Thread(target=task, name="MyThread")
thread.start()
thread.join()
Key Points
- Explicitly name threads to improve log readability.
- Use log levels (DEBUG, INFO, WARNING, etc.) to distinguish importance.
5. Choosing Between Threads and Async Processing
Threads are effective for I/O-bound tasks, but in some casesasyncio
may be more efficient.- Threads are suitable when:
- Running background tasks in GUI applications
- Manipulating data shared with other threads or processes
- Async is suitable when:
- Handling large volumes of I/O tasks efficiently
- State management is relatively simple
6. Keep Design Simple
Overusing threads can make code unnecessarily complex. Keep these points in mind for a maintainable design:- Limit the number of threads to what is strictly necessary.
- Clearly define each thread’s role.
- Minimize data sharing between threads; if possible, use queues for communication.

8. Conclusion
Python threads are a powerful tool for improving program efficiency. In this article, we covered everything from the basics to advanced techniques and best practices. Let’s recap the key points you should keep in mind when using threads.Main Takeaways
- Basic Concepts of Threads
- A thread is a small unit of execution that runs independently within a process, enabling concurrency.
- Understand the difference between concurrency and parallelism, and use each appropriately.
- Creating Threads in Python
- You can easily create threads with the
threading
module. - Control execution with the
start()
andjoin()
methods of theThread
class. - Create custom thread classes for more flexible control.
- Thread Synchronization
- Prevent data races using synchronization objects like
Lock
,RLock
, andSemaphore
. - Use events and timeouts to refine thread coordination.
- Impact of the GIL
- The GIL limits Python threads in CPU-bound tasks.
- For CPU-bound workloads, use multiprocessing; for I/O-bound workloads, use threads.
- Practical Examples
- We demonstrated threading in real-world scenarios such as web scraping, file operations, GUI applications, and real-time data processing.
- Best Practices
- Avoid deadlocks, optimize thread count, ensure safe termination, and leverage logging for debugging.
- Careful design improves both efficiency and program stability.
Mindset When Using Threads
- Threads Are Not a Silver Bullet Threads are powerful, but using them incorrectly can hurt performance. Always choose them for the right scenarios.
- Keep Design Simple Overusing threads increases complexity. Define clear roles, minimize shared data, and simplify synchronization.
- Consider Alternatives In some cases,
asyncio
ormultiprocessing
may be a better fit depending on the workload.
Next Steps
After mastering the basics of threading, consider exploring these topics further:- Asynchronous Programming
- Learn Python’s
asyncio
module to implement efficient asynchronous processing in a single thread.
- Multiprocessing
- Overcome GIL limitations by optimizing parallel execution for CPU-bound tasks.
- Advanced Thread Control
- Use thread pools (
concurrent.futures.ThreadPoolExecutor
) and debugging tools for more efficient thread management.
- Apply to Real-World Scenarios
- Work on projects like web crawlers or real-time data pipelines to build hands-on skills.