How to Read and Write Binary Files in Python: A Complete Guide

目次

1. Introduction

Python supports not only text files but also reading and writing binary files. By working with binary files, you can manipulate various types of data such as images, audio, video, and compressed files. This article explains how to safely and efficiently read binary files using Python.

1.1 What Is a Binary File?

A binary file is a file composed of binary data (a combination of 0s and 1s) that computers can understand, instead of human-readable text strings. Examples of binary files include:
  • Image files (PNG, JPEG, BMP, etc.)
  • Audio files (WAV, MP3, AAC, etc.)
  • Video files (MP4, AVI, MOV, etc.)
  • Compressed files (ZIP, RAR, GZ, etc.)
  • Executable files (EXE, DLL, BIN, etc.)
When opened in a normal text editor, binary files often appear as “garbled text.” This happens because the data is encoded in a specific format, and meaningful information can only be extracted using an appropriate program.

1.2 Difference Between Text and Binary Files

The main difference between binary and text files lies in their data storage method.
TypeData FormatExamples
Text FileStored with character encoding (UTF-8, Shift-JIS, etc.).txt, .csv, .json
Binary FileStored as sequences of 0s and 1s.jpg, .mp3, .exe

Main Differences

  1. Data Structure
  • Text files contain only data interpretable as characters.
  • Binary files can contain any type of data (images, audio, executable code, etc.).
  1. Size
  • Text files are usually smaller when storing small amounts of data.
  • Binary files may be larger due to encoding overhead.
  1. Editing Method
  • Text files can be opened and edited directly with editors like Notepad or VS Code.
  • Binary files require special tools (e.g., binary editors) for editing.

1.3 Why Handle Binary Files in Python?

Common reasons for handling binary files with Python include:

① Processing Images and Audio Data

By reading binary files, Python programs can analyze images or process audio data.
# Example: Read a PNG image as binary
with open("image.png", "rb") as file:
    binary_data = file.read()
    print(binary_data[:20])  # Display the first 20 bytes

② Analyzing Compressed Data

Python provides modules like zipfile and gzip to extract or compress ZIP/GZ files programmatically.
import gzip

# Example: Open a GZ compressed file
with gzip.open("example.gz", "rb") as file:
    content = file.read()
    print(content)

③ Parsing Binary Protocols

In low-level operations such as network communication or databases, binary data parsing is essential. The struct module allows you to convert binary data into numbers or strings.
import struct

# Example: Convert binary data to integer
binary_data = b'   '  # 4 bytes of data
integer_value = struct.unpack('<I', binary_data)[0]
print(integer_value)  # Output: 1

1.4 Summary

  • Binary files store information such as images, audio, and compressed data.
  • Unlike text files, data is stored as sequences of 0s and 1s.
  • Python enables parsing, processing, and converting binary data.
  • Using Python’s open() function and struct module, binary files can be handled efficiently.

2. How to Read Binary Files in Python and Basic Operations

In Python, you can open and read binary files using the open() function. This section explains the basic operations for working with binary files in Python.

2.1 Reading Binary Files with the open() Function

The open() function in Python is the fundamental way to open files. To open a binary file, you must specify 'rb' (read-only binary mode).

Basic Syntax

file = open("example.bin", "rb")  # 'rb' means "read in binary mode"
binary_data = file.read()  # Read the file content
file.close()  # Close the file
However, with this method, if you forget to explicitly call close(), the file may not be closed, causing a resource leak. Therefore, in Python it is common practice to use the with statement for safer file handling.

2.2 Safe Binary Reading with the with Statement

The with statement automatically closes the file when done, ensuring that resources are properly released even if an error occurs.

Example: Safe Reading of Binary Data

with open("example.bin", "rb") as file:
    binary_data = file.read()

# Once the with-block ends, the file is automatically closed

Benefits of Using the with Statement

  1. No need to call file.close() (it closes automatically)
  2. No resource leaks even if an error occurs
  3. Code becomes simpler and more readable

2.3 Variations of Reading Methods

Python provides several ways to read binary files. Choose the appropriate method depending on the use case.

① Read the Entire File at Once (read())

This method loads the entire file content into memory.
with open("example.bin", "rb") as file:
    binary_data = file.read()  # Read the entire file at once
Advantages
  • Simple and easy to understand
  • Efficient for small files
Disadvantages
  • For large files (hundreds of MB to GB), memory usage may become excessive

② Read in Fixed-Size Chunks (read(n))

This method reads the file in partial chunks, which is ideal for large files.
with open("example.bin", "rb") as file:
    chunk = file.read(1024)  # Read 1024 bytes (1KB) at a time
    while chunk:
        print(chunk)  # Process the chunk
        chunk = file.read(1024)  # Read the next 1KB
Advantages
  • Reduces memory usage
  • Efficient for handling large files
Disadvantages
  • Not suitable if you need the entire file loaded at once

③ Read Binary Data Line by Line (readline())

If the binary file contains newline characters, it can be read line by line.
with open("example.bin", "rb") as file:
    line = file.readline()  # Read one line
    while line:
        print(line)
        line = file.readline()  # Read the next line
Use Case
  • Binary log files or binary data that includes newlines
Note
  • If no newline exists, the entire content is treated as one line, so this method is valid only for specific files

2.4 File Positioning with seek()

The seek() method allows you to read data from any position in the file.

Basic Syntax of seek()

file.seek(offset, whence)
ArgumentDescription
offsetNumber of bytes to move
whenceReference point (0: beginning of file, 1: current position, 2: end of file)

Example: Reading Data from the Middle of a File

with open("example.bin", "rb") as file:
    file.seek(10)  # Move to the 10th byte from the beginning
    data = file.read(5)  # Read 5 bytes
    print(data)
Use Cases
  • Extract header information
  • Analyze specific sections of data

2.5 Getting the Current File Position with tell()

The tell() method returns the current file position (byte offset).

Example: Checking File Position

with open("example.bin", "rb") as file:
    file.read(10)  # Read 10 bytes
    position = file.tell()  # Get the current position
    print(f"Current file position: {position} bytes")
Use Cases
  • Check how much of the file has been read
  • Debugging when processing files mid-way

2.6 Summary

  • Use 'rb' mode when opening binary files in Python
  • The with statement ensures files are closed safely
  • read() loads the entire file into memory, which may be inefficient for large files — use read(n) instead
  • seek() allows moving to any file position, and tell() retrieves the current position
年収訴求

3. Efficient Methods for Reading Binary Files in Python

In the previous section, we explained the basic ways to open binary files. In this section, we will go into detail about efficient ways of reading binary files. Python offers multiple methods to read binary data, and choosing the right approach depends on the use case.

3.1 Reading the Entire Binary File at Once (read())

To read a binary file all at once, use the read() method.

Basic Syntax

with open("example.bin", "rb") as file:
    binary_data = file.read()
Advantages
  • Simple and intuitive
  • Suitable for small files (a few MB or less)
Disadvantages
  • Consumes a lot of memory if the file is large (hundreds of MB or more)
  • May cause crashes if the file size exceeds available memory

Example

with open("sample.bin", "rb") as file:
    binary_data = file.read()
    print(len(binary_data))  # Display file size in bytes
This method is safe for files that are a few MB in size.

3.2 Reading in Fixed-Size Chunks (read(n))

When handling large binary files, it is recommended to read the file in fixed-size chunks to optimize memory usage.

Basic Syntax

with open("example.bin", "rb") as file:
    chunk = file.read(1024)  # Read 1024 bytes (1KB) at a time
    while chunk:
        print(chunk)
        chunk = file.read(1024)  # Read the next 1KB
Advantages
  • Reduces memory usage when processing large files
  • Supports streaming-like processing
Disadvantages
  • Additional processing required for real-time handling

Example

with open("large_file.bin", "rb") as file:
    while True:
        chunk = file.read(4096)  # Read 4KB at a time
        if not chunk:
            break  # End when no more data
        print(f"Read {len(chunk)} bytes")
This method allows you to process files in the GB range without exhausting memory.

3.3 Reading Binary Data Line by Line (readline())

If the binary data contains newline characters, it can be read line by line.

Basic Syntax

with open("example.bin", "rb") as file:
    line = file.readline()
    while line:
        print(line)
        line = file.readline()
Use Cases
  • Binary log files or binary data that includes newline characters
Notes
  • If no newlines exist, the entire file will be treated as one line
  • Not commonly used in general binary processing

3.4 Reading Data from Specific File Positions (seek())

When analyzing binary files, you may want to read data starting from a specific file position. In such cases, use the seek() method.

Basic Syntax

file.seek(offset, whence)
ArgumentDescription
offsetNumber of bytes to move
whenceReference point (0: beginning of file, 1: current position, 2: end of file)

Example: Reading Data from a Specific Position

with open("example.bin", "rb") as file:
    file.seek(10)  # Move to the 10th byte from the beginning
    data = file.read(5)  # Read 5 bytes
    print(data)
This method is useful for parsing file headers or processing files with structured data.

3.5 Getting the Current File Position (tell())

The tell() method returns the current file position (byte offset).

Example

with open("example.bin", "rb") as file:
    file.read(20)  # Read 20 bytes
    position = file.tell()  # Get the current position
    print(f"Current position: {position} bytes")
Use Cases
  • Check how much of the file has been read
  • Debug processing within the file

3.6 Using Memory-Mapped Files for Faster Reading (mmap)

The mmap module allows you to map large binary files into virtual memory for faster access.

Basic Syntax

import mmap

with open("example.bin", "rb") as file:
    with mmap.mmap(file.fileno(), 0, access=mmap.ACCESS_READ) as mmapped_file:
        print(mmapped_file[:100])  # Get the first 100 bytes
Advantages
  • Faster access since the file is handled in memory
  • Allows direct access to specific ranges
Disadvantages
  • More complex than standard file handling
  • May cause memory issues if the file is extremely large

3.7 Summary

  • Efficient reading of binary files requires choosing methods based on file size and purpose
  • Use read() for small files
  • Use read(n) for large files to process them in chunks
  • Use seek() to retrieve data from specific positions
  • Use mmap for fast access, but apply carefully depending on the case

4. How to Analyze Read Binary Data

After reading a binary file in Python, it is important to analyze the data properly. Since binary data is usually stored in formats such as integers, strings, or floating-point numbers, understanding the data structure and applying the right method for analysis is essential. In this section, we will explain how to analyze binary data using Python’s struct module.

4.1 What Is the struct Module?

The struct module in Python is a standard library for encoding and decoding binary data with specified formats.

Main Features

  • Convert binary data into integers, floats, and strings (unpacking)
  • Convert values into binary format (packing)
  • Specify endianness (byte order)

Basic Syntax of the struct Module

import struct

# Convert binary data into specific types (unpack)
data = struct.unpack(format, binary_data)

# Convert values into binary data (pack)
binary_data = struct.pack(format, value1, value2, ...)

4.2 Unpacking (Decoding) Binary Data

To interpret binary data read from a file as Python values (integers, strings, etc.), use struct.unpack().

Example: Decode a 4-Byte Integer (32-bit)

import struct

# Binary data (4 bytes)
binary_data = b'   '  # Represents 1 (Little Endian)

# Unpack as an unsigned 32-bit integer in Little Endian
value = struct.unpack('<I', binary_data)[0]
print(value)  # Output: 1
Format SpecifierTypeBytes
bSigned 8-bit integer (char)1
BUnsigned 8-bit integer (uchar)1
hSigned 16-bit integer (short)2
HUnsigned 16-bit integer (ushort)2
iSigned 32-bit integer (int)4
IUnsigned 32-bit integer (uint)4
fFloating-point (float)4
dFloating-point (double)8

4.3 Parsing String Data

When a binary file contains string data, you can use struct.unpack() to decode it and then apply the appropriate character encoding.

Example: Decode a 10-Byte String

import struct

# 10-byte binary data
binary_data = b'HelloWorld'

# Interpret as string
decoded_string = struct.unpack('10s', binary_data)[0].decode('utf-8')
print(decoded_string)  # Output: HelloWorld

Notes

  • By specifying the byte size (e.g., 10s), you can extract fixed-length strings.
  • Use .decode('utf-8') to convert into the correct character encoding.

4.4 Parsing Floating-Point Numbers

Some binary files store numeric data in IEEE 754 floating-point format. The struct module allows decoding these values.

Example: Decode a 4-Byte Float

import struct

# 4-byte float (IEEE 754 format)
binary_data = b'  €?'  # Represents 1.0

# Unpack
value = struct.unpack('<f', binary_data)[0]
print(value)  # Output: 1.0

4.5 Specifying Endianness (Byte Order)

Binary data can be stored in two byte orders: Little Endian and Big Endian. Correctly specifying endianness is crucial for proper interpretation.

Example: Difference Between Little and Big Endian

import struct

binary_data = b'   '  # Represents 1 (Little Endian)

# Little Endian (least significant byte first)
little_endian = struct.unpack('<I', binary_data)[0]
print(f"Little Endian: {little_endian}")  # Output: 1

# Big Endian (most significant byte first)
big_endian = struct.unpack('>I', binary_data)[0]
print(f"Big Endian: {big_endian}")  # Output: 16777216

4.6 Summary

  • The struct module allows parsing binary data into numbers and strings
  • Understanding endianness is essential for accurate decoding
  • You can efficiently parse file headers and structured data by unpacking multiple data types together

5. How to Create and Write Binary Files in Python

In the previous section, we explained how to analyze binary data in Python. In this section, we will cover how to create binary data and write it to a binary file using Python.

5.1 Writing Files in Binary Mode

To create a binary file in Python, use the 'wb' (write-only binary mode) with the open() function.

Basic Syntax

with open("example.bin", "wb") as file:
    file.write(binary_data)
  • 'wb' means “write in binary mode”.
  • Use the write() method to write binary data into a file.

Simple Example: Writing Binary Data

with open("output.bin", "wb") as file:
    file.write(b'')  # Write 4 bytes of data
Key Points
  • b'' represents 4 bytes of binary data (in hexadecimal format).
  • The actual file size will be 4 bytes.

5.2 Creating Binary Data with struct.pack()

As the reverse of struct.unpack(), Python’s struct.pack() is used to convert data into binary format before writing.

Basic Syntax

import struct
binary_data = struct.pack(format, value1, value2, ...)
  • format specifies the binary data type (integer, float, string, etc.).
  • value1, value2, ... are the values to be packed.

5.3 Writing Numeric Data to a Binary File

To save integers or floating-point numbers into a binary file, first use struct.pack() to convert them into binary.

Example: Convert Integers to Binary and Write

import struct

# Unsigned 16-bit integer (H) and unsigned 32-bit integer (I)
binary_data = struct.pack('<HI', 512, 123456789)

# Write to file
with open("numbers.bin", "wb") as file:
    file.write(binary_data)
Explanation
  • <HILittle Endian: Unsigned 16-bit integer (2 bytes) + Unsigned 32-bit integer (4 bytes)
  • 512 (hex: 0x0200)
  • 123456789Í[ (hex: 0x075BCD15)

5.4 Writing String Data to a Binary File

Binary files can also store fixed-length string data.

Example: Writing a 10-Byte String

import struct

text = "Hello"
binary_data = struct.pack('10s', text.encode('utf-8'))  # Fixed length: 10 bytes

with open("text.bin", "wb") as file:
    file.write(binary_data)
Key Points
  • '10s' means a fixed-length string of 10 bytes.
  • Convert to binary with encode('utf-8').
  • If the string is shorter than 10 bytes, padding (null bytes) is added automatically.

5.5 Writing Floating-Point Numbers to a Binary File

Floating-point numbers can also be stored with struct.pack().

Example: Write a Float

import struct

float_value = 3.14
binary_data = struct.pack('<f', float_value)  # 4-byte float

with open("float.bin", "wb") as file:
    file.write(binary_data)
Explanation
  • <f means a Little Endian 4-byte floating-point number.
  • The IEEE 754 representation of 3.14 is 0xC3F54840.

5.6 Appending to a Binary File ('ab' Mode)

To add data to an existing binary file, use 'ab' (append mode) instead of 'wb'.

Example: Append Data to a Binary File

with open("output.bin", "ab") as file:
    file.write(b'ÿÿ')  # Write additional data
Key Points
  • 'ab' means “append in binary mode”.
  • Data is added at the end without overwriting existing content.

5.7 Writing Data to Specific Positions with seek()

It is also possible to write data at a specific position in a file.

Example: Write Data at the 10th Byte from the Beginning

with open("output.bin", "r+b") as file:
    file.seek(10)  # Move to the 10th byte
    file.write(b'ª»')  # Write 2 bytes
Key Points
  • 'r+b' mode means “read/write in binary mode”.
  • seek(10) moves to the 10th byte from the beginning before writing.

5.8 Summary

  • Use 'wb' mode in Python to write binary files
  • struct.pack() converts numbers, strings, and floats into binary
  • 'ab' (append mode) allows adding data to existing binary files
  • seek() lets you write data at specific positions in a file

6. Practical Examples of Handling Binary Files in Python

So far, we have covered the basics of reading and writing binary files. In this section, we will introduce practical examples using Python to handle binary files.

6.1 Reading and Writing Image Files

Binary files are often used for image processing. Here is how to read and copy a binary image file.

Example: Copy an Image File

# Read binary image data
with open("input.png", "rb") as infile:
    image_data = infile.read()

# Write binary image data to another file
with open("copy.png", "wb") as outfile:
    outfile.write(image_data)
Key Points
  • The image data is read in binary and written as-is to another file.
  • This ensures that the image is copied without corruption.

6.2 Reading and Writing Audio Files

Binary files are also used in audio data processing. For example, WAV files can be read and processed in Python.

Example: Read the Header of a WAV File

with open("sample.wav", "rb") as file:
    header = file.read(44)  # The header of a WAV file is 44 bytes
    print(header)
Explanation
  • By reading the first 44 bytes, you can obtain WAV file metadata such as sample rate and channel count.

6.3 Handling Compressed Files (ZIP/GZ)

Python provides zipfile and gzip modules to process compressed binary files.

Example: Extract a ZIP File

import zipfile

with zipfile.ZipFile("archive.zip", "r") as zip_ref:
    zip_ref.extractall("extracted")  # Extract contents into the "extracted" folder

Example: Read a GZ Compressed File

import gzip

with gzip.open("compressed.gz", "rb") as file:
    content = file.read()
    print(content[:50])  # Display the first 50 bytes

6.4 Reading and Writing Executable Files

Binary files also include executable files (.exe, .bin, etc.). These can be read in binary mode and written to another file.

Example: Copy an Executable File

with open("program.exe", "rb") as infile:
    exe_data = infile.read()

with open("copy.exe", "wb") as outfile:
    outfile.write(exe_data)
Notes
  • When handling executable files, never attempt to edit them without specialized knowledge.
  • Improper modifications may cause the file to become corrupted or insecure.

6.5 Using struct to Parse Binary Data Structures

When binary data contains structured values, use the struct module to parse them.

Example: Parse a Custom Binary Format

Suppose you have binary data in the following format:
  • 2-byte unsigned integer
  • 4-byte float
  • 10-byte string
You can parse it as follows:
import struct

binary_data = struct.pack("<Hf10s", 1000, 3.14, b"HelloWorld")

# Decode (unpack)
value1, value2, value3 = struct.unpack("<Hf10s", binary_data)
print(value1, value2, value3.decode("utf-8"))
Result
1000 3.14 HelloWorld

6.6 Summary

  • Binary files are widely used for images, audio, video, compressed data, and executables.
  • Python’s standard library supports binary file processing with modules such as zipfile, gzip, and struct.
  • By combining reading, writing, and parsing, you can handle various binary file formats.

7. Common Pitfalls and Best Practices When Handling Binary Files

When working with binary files in Python, there are several pitfalls and precautions to be aware of. Here we will summarize common mistakes and best practices.

7.1 Common Pitfalls

  1. Opening Files Without Specifying Binary Mode
  • If you open a binary file without using 'rb' or 'wb', Python may attempt to interpret it as text, potentially causing corruption or errors.
  1. Memory Overuse When Reading Large Files
  • Using read() loads the entire file into memory at once. For files of several hundred MB or GB, this may cause MemoryError.
  1. Ignoring Endianness (Byte Order)
  • If you decode binary data without specifying the correct endianness, values may be misinterpreted.
  1. Editing Executable Files Without Knowledge
  • Improper editing of binary executables (EXE, DLL, etc.) may corrupt the file or cause security issues.
  1. Incorrect String Encoding
  • Binary strings must be decoded with the correct .decode('utf-8') or other encoding. Using the wrong encoding may cause garbled text or errors.

7.2 Best Practices

  1. Always Use the with Statement
  • The with statement ensures files are automatically closed, preventing resource leaks.
  1. Use read(n) for Large Files
  • For large files, read them in chunks to reduce memory usage.
  1. Specify Endianness Explicitly
  • When using struct.unpack(), always specify < (Little Endian) or > (Big Endian).
  1. Validate File Format Before Processing
  • Check headers or magic numbers to ensure the file type is correct before reading.
  1. Use Appropriate Libraries
  • For images: PIL/Pillow
  • For audio: wave, pydub
  • For compressed files: zipfile, gzip
  1. Use Memory-Mapped Files (mmap) for Performance
  • Efficiently process large files without loading the entire file into memory.

7.3 Summary

  • Specify binary mode ('rb', 'wb') when opening binary files.
  • Use with for safe file handling.
  • Read in chunks for large files.
  • Be aware of endianness when decoding.
  • Use specialized libraries for each file format.

8. Conclusion

In this article, we explained how to read, write, and analyze binary files in Python. Binary files are essential when handling images, audio, video, compressed data, or custom data formats, and mastering them greatly expands the scope of Python applications.

Key Takeaways

  • Binary files store data as sequences of bytes rather than human-readable text.
  • Use 'rb', 'wb', 'ab' modes to properly read, write, or append binary data.
  • The struct module allows converting binary data into integers, floating-point numbers, and strings.
  • Endianness (byte order) must be correctly specified to ensure accurate data interpretation.
  • Large binary files should be processed in chunks or via mmap to optimize memory usage.
  • Best practices: always use the with statement, handle exceptions, and use specialized libraries depending on the file type.

Final Thoughts

By mastering binary file handling in Python, you can work with a wide range of data formats—from image and audio processing to protocol analysis and custom data formats. This is a crucial skill for anyone aiming to advance in data engineering, scientific computing, or systems programming. Now that you have learned the theory and practical techniques, try applying them to your own projects, research, and development. 🚀