Optimizing Python Memory Usage: From Basics to Advanced

目次

1. Introduction

Target Audience

This article is primarily aimed at beginners to intermediate users who use Python on a daily basis. It is especially useful for those who want to check and optimize their program’s memory usage.

Purpose of the Article

The purpose of this article is as follows:
  1. Understand how Python’s memory management works.
  2. Learn concrete methods for measuring memory usage.
  3. Acquire optimization techniques to reduce memory consumption.
By understanding this content, you will be helped to improve the performance of Python programs.

2. Fundamentals of Python Memory Management

How Memory Management Works

In Python, memory management is performed using two primary mechanisms: reference counting and garbage collection.

Reference Counting

Reference counting is a mechanism that counts how many references each object has. In Python, when an object is created, its reference count is set to 1. Each time another variable references that object, the count increases, and when a reference is removed, the count decreases. When the reference count reaches zero, the object is automatically freed from memory.
Code Example
import sys

a = [1, 2, 3]  ## List object is created
print(sys.getrefcount(a))  ## Initial reference count (usually 2, includes internal reference)

b = a  ## Another variable references the same object
print(sys.getrefcount(a))  ## Reference count increases

del b  ## Reference is removed
print(sys.getrefcount(a))  ## Reference count decreases

Garbage Collection

Garbage Collection (GC) is a mechanism that reclaims memory that cannot be freed by reference counting (especially cyclic references). In Python, a built-in garbage collector runs periodically to automatically delete unnecessary objects. The garbage collector specializes in detecting and freeing cyclic references and is useful in situations such as:
class Node:
    def __init__(self):
        self.next = None

## Example of a cyclic reference
a = Node()
b = Node()
a.next = b
b.next = a

## In this state, the reference count never reaches zero, so memory is not freed
If you want to manipulate the garbage collector explicitly, you can control it using the gc module.
import gc

## Force the garbage collector to run
gc.collect()

Risks of Memory Leaks

Python’s memory management is very powerful, but not perfect. In particular, memory leaks can occur in situations such as:
  1. Cyclic references exist but the garbage collector is disabled.
  2. Long-running programs where unnecessary objects remain in memory.
To prevent these issues, it is important to design to avoid cyclic references and explicitly delete unnecessary objects.

Summary of This Section

  • Python’s memory management operates via reference counting and garbage collection.
  • Garbage collection helps especially with resolving cyclic references, but proper design is crucial to prevent unnecessary memory consumption.
  • The next section will explain how to measure memory usage concretely.

3. How to Check Memory Usage

Basic Approach

Check Object Size with sys.getsizeof()

Python’s standard library sys module includes the getsizeof() function, which lets you obtain the memory size of any object in bytes.
Example Code
import sys

## Check memory size of each object
x = 42
y = [1, 2, 3, 4, 5]
z = {"a": 1, "b": 2}

print(f"Size of x: {sys.getsizeof(x)} bytes")
print(f"Size of y: {sys.getsizeof(y)} bytes")
print(f"Size of z: {sys.getsizeof(z)} bytes")
Notes
  • sys.getsizeof() returns only the size of the object itself; it does not include the sizes of other objects it references (such as elements inside a list).
  • Measuring the exact memory usage of large objects requires additional tools.

Using Profiling Tools

Function‑Level Memory Measurement with memory_profiler

memory_profiler is an external library that measures the memory usage of Python programs in detail on a per‑function basis. It makes it easy to pinpoint how much memory specific parts of your code consume.
Setup
First, install memory_profiler:
pip install memory-profiler
Usage
By using the @profile decorator, you can measure memory consumption at the function level.
from memory_profiler import profile

@profile
def example_function():
    a = [i for i in range(10000)]
    b = {i: i**2 for i in range(1000)}
    return a, b

if __name__ == "__main__":
    example_function()
Run the following command at execution time:
python -m memory_profiler your_script.py
Sample Output
Line ##    Mem usage    Increment   Line Contents
------------------------------------------------
     3     13.1 MiB     13.5 MiB   @profile
     4     16.5 MiB      3.4 MiB   a = [i for i in range(10000)]
     5     17.2 MiB      0.7 MiB   b = {i: i**2 for i in range(1000)}

Monitor Overall Process Memory Usage with psutil

psutil is a powerful library that can monitor the total memory usage of a process. It’s useful when you want to understand the overall memory consumption of a specific script or application.
Setup
Install it with the following command:
pip install psutil
Usage
import psutil

process = psutil.Process()
print(f"Total process memory usage: {process.memory_info().rss / 1024**2:.2f} MB")
Main Features
  • Can retrieve the current process’s memory usage in bytes.
  • Allows you to monitor program performance while gaining insights for optimization.

Detailed Memory Tracing

Trace Memory Allocations with tracemalloc

Using the Python standard library tracemalloc, you can trace the origins of memory allocations and analyze which parts consume the most memory.
Usage
import tracemalloc

## Start memory tracing
tracemalloc.start()

## Memory‑consuming operation
a = [i for i in range(100000)]

## Display memory usage
snapshot = tracemalloc.take_snapshot()
top_stats = snapshot.statistics("lineno")

print("[Memory Usage]")
for stat in top_stats[:5]:
    print(stat)
Main Uses
  • Identify problematic memory allocations.
  • Compare multiple operations to find optimization opportunities.

Summary of This Section

  • To understand Python’s memory usage, there are many options ranging from basic tools like sys.getsizeof() to profiling tools such as memory_profiler and psutil.
  • If memory consumption is critical for your program, choose the appropriate tool and manage it efficiently.
  • The next section will discuss concrete methods for actually optimizing memory usage.

4. How to Optimize Memory Usage

Choosing Efficient Data Structures

Replacing Lists with Generators

List comprehensions are convenient, but they consume a lot of memory when handling large amounts of data. Using generators instead generates data on the fly as needed, significantly reducing memory usage.
Code Example
## Using a list
list_data = [i**2 for i in range(1000000)]
print(f"List memory size: {sys.getsizeof(list_data) / 1024**2:.2f} MB")

## Using a generator
gen_data = (i**2 for i in range(1000000))
print(f"Generator memory size: {sys.getsizeof(gen_data) / 1024**2:.2f} MB")
Using generators can dramatically reduce memory usage.

Using collections.defaultdict as a Dictionary Alternative

Python dictionaries are convenient, but they consume a lot of memory when handling large datasets. Using collections.defaultdict enables efficient default value handling and simplifies processing.
Code Example
from collections import defaultdict

## Regular dictionary
data = {}
data["key"] = data.get("key", 0) + 1

## Using defaultdict
default_data = defaultdict(int)
default_data["key"] += 1

Managing Unnecessary Objects

Explicit Deletion with the del Statement

In Python, you can manually delete unnecessary objects, which reduces the burden on the garbage collector.
Code Example
## Deleting unnecessary variable
a = [1, 2, 3]
del a
After deletion, the variable a is freed from memory.

Using the Garbage Collector

You can use the gc module to manually run the garbage collector, which can resolve memory leaks caused by circular references.
Code Example
import gc

## Running the garbage collector
gc.collect()

Optimization Using External Libraries

Leveraging NumPy and Pandas

NumPy and Pandas are designed for efficient memory management. Especially when handling large amounts of numeric data, using these libraries can significantly reduce memory usage.
NumPy Example
import numpy as np

## Python list
data_list = [i for i in range(1000000)]
print(f"List memory size: {sys.getsizeof(data_list) / 1024**2:.2f} MB")

## NumPy array
data_array = np.arange(1000000)
print(f"NumPy array memory size: {data_array.nbytes / 1024**2:.2f} MB")
NumPy arrays are more memory-efficient compared to lists.

Preventing Memory Leaks

To prevent memory leaks, it is important to keep the following points in mind.
  1. Avoid Circular References Design objects so they do not reference each other.
  2. Scope Management Be mindful of function and class scopes to avoid leaving unnecessary objects behind.

Summary of This Section

  • Optimizing memory usage requires selecting efficient data structures and properly deleting unnecessary objects.
  • Leveraging external libraries such as NumPy and Pandas enables even more efficient memory management.
  • The next section will discuss troubleshooting techniques that help solve real-world problems.
侍エンジニア塾

5. Troubleshooting

How to Handle Sudden Increases in Memory Usage

Adjust the Garbage Collector

If the garbage collector is not functioning properly, unnecessary memory may not be released, causing usage to spike. To resolve this, use the gc module to adjust the garbage collector.
Code Example
import gc

## Check the garbage collector's status
print(gc.get_threshold())

## Run the garbage collector manually
gc.collect()

## Change the garbage collector settings (e.g., adjust thresholds)
gc.set_threshold(700, 10, 10)

Reevaluate Object Lifecycles

Some objects may remain in memory even after they are no longer needed. In such cases, consider reviewing the object’s lifecycle and deleting it at an appropriate time.

Memory Leaks Caused by Circular References

Problem Overview

Circular references occur when two or more objects reference each other. In this case, reference counts never reach zero, and the garbage collector cannot free them.
Solutions
  • Use weak references (weakref module) to avoid circular references.
  • Run the garbage collector manually to break circular references.
Code Example
import weakref

class Node:
    def __init__(self, name):
        self.name = name
        self.next = None

a = Node("A")
b = Node("B")

## Use weak references to avoid circular references
a.next = weakref.ref(b)
b.next = weakref.ref(a)

When Memory Profiling Tools Do Not Work

memory_profiler Error

When using memory_profiler, the @profile decorator may not work. This issue is caused by not running the script correctly.
Solution
  1. Run the script with the -m memory_profiler option:
   python -m memory_profiler your_script.py
  1. Ensure that the function decorated with the decorator is correctly specified.

psutil Error

If psutil cannot retrieve memory information, there may be issues with the library version or environment.
Solution
  1. Check the psutil version and install the latest version:
   pip install --upgrade psutil
  1. Verify that you are retrieving process information correctly:
   import psutil
   process = psutil.Process()
   print(process.memory_info())

Handling Memory Exhaustion Errors

Problem Overview

When handling large datasets, programs may encounter memory exhaustion errors (MemoryError).
Solutions
  • Reduce data size Delete unnecessary data and use efficient data structures.
   ## Use a generator
   large_data = (x for x in range(10**8))
  • Process in chunks Split data into smaller chunks to reduce memory consumption at any one time.
   for chunk in range(0, len(data), chunk_size):
       process_data(data[chunk:chunk + chunk_size])
  • Leverage external storage Store data on disk instead of memory for processing (e.g., SQLite, HDF5).

Summary of This Section

  • Use garbage collection and lifecycle management to properly control memory usage.
  • If circular references or tool errors occur, they can be resolved with weak references and proper configuration.
  • Memory exhaustion errors can be avoided by revisiting data structures, using chunked processing, and leveraging external storage.

6. Practical Example: Measuring Memory Usage in Python Scripts

Here we present a concrete example of measuring memory usage within a Python script using the tools and techniques discussed so far. Through this practical example, you will learn how to analyze and optimize memory usage.

Sample Scenario: Comparing Memory Usage of Lists and Dictionaries

Code Example

The following script measures the memory usage of lists and dictionaries using sys.getsizeof() and memory_profiler.
import sys
from memory_profiler import profile

@profile
def compare_memory_usage():
    ## Create list
    list_data = [i for i in range(100000)]
    print(f"List memory usage: {sys.getsizeof(list_data) / 1024**2:.2f} MB")

    ## Create dictionary
    dict_data = {i: i for i in range(100000)}
    print(f"Dictionary memory usage: {sys.getsizeof(dict_data) / 1024**2:.2f} MB")

    return list_data, dict_data

if __name__ == "__main__":
    compare_memory_usage()

Execution Steps

  1. memory_profiler is not installed, please execute the following:
   pip install memory-profiler
  1. Run the script with memory_profiler:
   python -m memory_profiler script_name.py

Sample Output

Line ##    Mem usage    Increment   Line Contents
------------------------------------------------
     5     13.2 MiB     13.2 MiB   @profile
     6     17.6 MiB      4.4 MiB   list_data = [i for i in range(100000)]
     9     22.2 MiB      4.6 MiB   dict_data = {i: i for i in range(100000)}

List memory usage: 0.76 MB
Dictionary memory usage: 3.05 MB
From this example, you can see that dictionaries consume more memory than lists. This provides guidance for selecting the appropriate data structure based on application requirements.

Sample Scenario: Monitoring Overall Process Memory Usage

Code Example

The following script uses psutil to monitor the overall process memory usage in real time.
import psutil
import time

def monitor_memory_usage():
    process = psutil.Process()
    print(f"Initial memory usage: {process.memory_info().rss / 1024**2:.2f} MB")

    ## Simulate memory consumption
    data = [i for i in range(10000000)]
    print(f"Memory usage during processing: {process.memory_info().rss / 1024**2:.2f} MB")

    del data
    time.sleep(2)  ## Wait for garbage collector to run
    print(f"Memory usage after data deletion: {process.memory_info().rss / 1024**2:.2f} MB")

if __name__ == "__main__":
    monitor_memory_usage()

Execution Steps

  1. psutil is not installed, please execute the following:
   pip install psutil
  1. Run the script:
   python script_name.py

Sample Output

Initial memory usage: 12.30 MB
Memory usage during processing: 382.75 MB
Memory usage after data deletion: 13.00 MB
From these results, you can observe the behavior when large amounts of data consume memory and how memory is released by deleting unnecessary objects.

Key Points of This Section

  • To measure memory usage, it is important to combine tools (such as sys.getsizeof(), memory_profiler, psutil) appropriately.
  • Visualizing data structures and overall process memory usage helps identify bottlenecks and enables efficient program design.

7. Summary and Next Steps

Key Points of the Article

  1. Fundamentals of Python Memory Management
  • Python automatically manages memory using reference counting and garbage collection.
  • Proper design is required to prevent issues caused by circular references.
  1. How to Check Memory Usage
  • Using sys.getsizeof(), you can check the memory size of individual objects.
  • Tools such as memory_profiler and psutil allow detailed measurement of memory consumption for functions or entire processes.
  1. How to Optimize Memory Usage
  • Using generators and efficient data structures (e.g., NumPy arrays) can reduce memory consumption when processing large data sets.
  • Deleting unnecessary objects and leveraging the garbage collector helps prevent memory leaks.
  1. Applying in Practical Examples
  • Through real code, we learned the steps for measuring memory and optimization techniques.
  • We practiced examples comparing memory usage of lists versus dictionaries and monitoring memory for an entire process.

Next Steps

  1. Apply to Your Own Projects
  • Incorporate the methods and tools introduced in this article into your everyday Python projects.
  • For example, try memory_profiler on scripts handling large data to pinpoint memory‑intensive sections.
  1. Learn More Advanced Memory Management
  1. Utilize External Tools and Services
  • In large‑scale projects, using profiling features of py-spy or PyCharm enables more detailed analysis.
  • When running in cloud environments, take advantage of monitoring tools offered by AWS and Google Cloud.
  1. Continuous Code Review and Improvement
  • If developing in a team, discuss memory usage during code reviews to increase optimization opportunities.
  • Cultivating coding habits that prioritize memory efficiency yields long‑term benefits.

Conclusion

The skill of properly managing Python’s memory usage contributes not only to program efficiency but also to your growth as a developer. Building on the content presented in this article, work on real projects to deepen your understanding further.