Python Binary File Guide: Practical Read, Parse, Write

目次

1. Introduction

Python supports not only text files but also reading and writing binary files. By handling binary files, you can manipulate various data such as images, audio, video, and compressed files. In this article, we will explain how to read binary files safely and efficiently using Python.

1.1 What is a Binary File?

A binary file is a file that consists of binary data (a combination of 0s and 1s) rather than human‑readable strings. Representative examples of binary files include the following:

  • Image files (PNG, JPEG, BMP, etc.)
  • Audio files (WAV, MP3, AAC, etc.)
  • Video files (MP4, AVI, MOV, etc.)
  • Compressed files (ZIP, RAR, GZ, etc.)
  • Executable program files (EXE, DLL, BIN, etc.)

Binary files usually appear as garbled characters when opened with a regular text editor. This is because the data is encoded in a specific format and does not become meaningful information unless parsed with an appropriate program.

1.2 Differences from Text Files

The major difference between binary files and text files is their data storage method.

TypeData contentExample
Text fileSaved using character encoding (UTF-8, Shift-JIS, etc.).txt, .csv, .json
Binary fileSaved as a byte sequence of 0s and 1s.jpg, .mp3, .exe

Main Differences

  1. Data structure
  • Text files contain only data that can be interpreted as characters.
  • Binary files contain all kinds of data (images, audio, executable code, etc.).
  1. Size
  • If the data to be stored is small, text files tend to have a smaller file size.
  • Binary files can be larger due to encoding overhead even with the same content.
  1. Editing method
  • Text files can be opened and edited directly with editors such as Notepad or VS Code.
  • Binary files cannot be edited without a specialized program (e.g., a hex editor).

1.3 Why Use Python for Binary Files

Reasons to manipulate binary files with Python include the following:

① Processing Image and Audio Data

By reading binary files, Python programs can analyze images and process audio data.

# Example: Read a PNG image file as binary
with open("image.png", "rb") as file:
    binary_data = file.read()
    print(binary_data[:20])  # Display the first 20 bytes

② Analyzing Compressed Data

Python includes modules such as zipfile and gzip, which allow you to decompress and compress ZIP or GZ files programmatically.

import gzip

# Example: Open a GZ-compressed file
with gzip.open("example.gz", "rb") as file:
    content = file.read()
    print(content)

③ Analyzing Binary Protocols

In low‑level network communication or database operations, analyzing binary data is required. Using the struct module, you can convert binary data into numbers or strings.

import struct

# Example: Convert binary data to an integer
binary_data = b'x01x00x00x00'  # 4-byte data
integer_value = struct.unpack('<I', binary_data)[0]
print(integer_value)  # Output: 1

1.4 Summary

  • Binary files are file formats that store information such as images, audio, and compressed data.
  • Unlike text files, data is stored as a byte sequence of 0s and 1s.
  • Using Python enables analysis, processing, and conversion of binary data.
  • By using Python’s open() function and struct module, you can handle binary files efficiently.
Ad

2. How to Read Binary Files in Python and Basic Operations

In Python, you can open and read binary files using the open() function. This section explains the basic operations for binary files in Python.

2.1 Reading Binary Files Using the open() Function

Python’s open() function is the basic function for opening files. When opening a binary file, specify 'rb' (read-only binary mode).

Basic Syntax

file = open("example.bin", "rb")  # 'rb' means "read in binary mode"
binary_data = file.read()  # Read the contents of the file
file.close()  # Close the file

However, with this method, if you do not explicitly call close(), the file will not be closed, which can cause resource leaks. Therefore, in Python it is common to use the with statement to open files safely.

2.2 Safe Binary File Reading Using the with Statement

Using the with statement automatically closes the file, so resources are properly released even if an error occurs.

Example: Safe Reading of Binary Data

with open("example.bin", "rb") as file:
    binary_data = file.read()

# When leaving the with-block, the file is automatically closed

Benefits of Using the with Statement

  1. No need to call file.close() (automatically closed)
  2. Does not leak resources even if an error occurs
  3. Code is simpler and more readable

2.3 Variations of Reading Methods

Python provides several methods for reading binary files. Choose the appropriate method according to your use case.

① Read All Data at Once (read())

This method loads the entire binary file content into memory.

with open("example.bin", "rb") as file:
    binary_data = file.read()  # Read all data at once

Advantages

  • Simple and easy to understand
  • Efficient for small files

Disadvantages

  • For large files (hundreds of MB to GB), it may consume a lot of memory

② Read Specified Number of Bytes at a Time (read(n))

Reading the file in partial chunks is suitable for processing large files.

with open("example.bin", "rb") as file:
    chunk = file.read(1024)  # Read 1024 bytes (1KB) at a time
    while chunk:
        print(chunk)  # Process the read data
        chunk = file.read(1024)  # Read the next 1024 bytes

Advantages

  • Reduces memory consumption
  • Can process large files efficiently

Disadvantages

  • Not suitable for use cases that require processing the entire file at once

③ Read Binary Data Line by Line (readline())

If the binary data contains newline characters, it can be read line by line.

with open("example.bin", "rb") as file:
    line = file.readline()  # Read one line at a time
    while line:
        print(line)
        line = file.readline()  # Read the next line

Use Cases

  • Binary log files and other binary data that contain newlines

Cautions

  • If there are no newlines, the entire content is considered a single line, so this method is only effective for appropriate files

2.4 File Position Manipulation Using seek()

Using seek() allows you to read data from any position in the file.

Basic Syntax of seek()

file.seek(offset, whence)
引数説明
offsetNumber of bytes to move
whenceReference point@PH-VOID-41@@: start of file, 1: current position, 2: end of file)

Example: Reading Data from the Middle of a File

with open("example.bin", "rb") as file:
    file.seek(10)  # Move to the 10th byte from the start
    data = file.read(5)  # Read 5 bytes
    print(data)

Use Cases

  • Retrieve file header information
  • Analyze specific parts of the data

2.5 Get Current File Position with tell()

Using the tell() method allows you to obtain the current file position (byte offset).

Example: Checking File Position

with open("example.bin", "rb") as file:
    file.read(10)  # Read 10 bytes
    position = file.tell()  # Get the current file position
    print(f"Current position: {position} bytes")

Use Cases

  • Verify how much of the file has been read
  • Debug when processing occurs in the middle of a file

2.6 Summary

  • When opening a binary file in Python, use the 'rb' mode
  • Using the with statement allows you to close files safely
  • Reading all data at once (read()) consumes a lot of memory, so for large files use read(n)
  • Use seek() to move to any position in the file, and tell() to obtain the current file position
Ad
年収訴求

3. Efficient Binary File Reading Using Python

In the previous section, we covered the basic ways to open binary files. In this section, we will discuss methods for efficiently reading binary files in detail. Python offers various ways to read binary data, and selecting the appropriate technique based on the use case is important.

3.1 Read the Entire Binary File at Once (read())

To read a binary file all at once, use the read() method.

Basic Syntax

with open("example.bin", "rb") as file:
    binary_data = file.read()

Advantages

  • Simple and intuitive
  • Suitable for small files (a few MB or less)

Disadvantages

  • Consumes a lot of memory for large files (hundreds of MB or more)
  • If the data does not fit in memory, the program may crash

Example

with open("sample.bin", "rb") as file:
    binary_data = file.read()
    print(len(binary_data))  # Display file size (in bytes)

This method works fine for files of a few MB.

3.2 Read in Chunks of Specified Byte Sizes (read(n))

When handling large binary files, the approach of reading in fixed-size byte chunks to consider memory efficiency is recommended.

Basic Syntax

with open("example.bin", "rb") as file:
    chunk = file.read(1024)  # Read 1024 bytes (1KB) at a time
    while chunk:
        print(chunk)
        chunk = file.read(1024)  # Read the next 1024 bytes

Advantages

  • Reduces memory load when handling large files
  • Enables stream processing

Disadvantages

  • Additional handling is required for real-time data processing

Example

with open("large_file.bin", "rb") as file:
    while True:
        chunk = file.read(4096)  # Read 4KB at a time
        if not chunk:
            break  # Stop when no more data
        print(f"Read {len(chunk)} bytes")

Using this method, you can process GB-sized files without straining memory.

3.3 Read Binary Data Line by Line (readline())

If the binary data contains newline characters, it can be read line by line.

Basic Syntax

with open("example.bin", "rb") as file:
    line = file.readline()
    while line:
        print(line)
        line = file.readline()

Use Cases

  • Binary log files and other binary data that includes newlines

Points to Note

  • If the binary file has no newlines, the entire content is treated as a single line
  • Not commonly used for typical binary data processing

3.4 Read Data from a Specific Position in a File (using seek())

When analyzing a binary file, there are cases where you want to read data from a specific position. In such cases, use the seek() method.

Basic Syntax

file.seek(offset, whence)
引数説明
offsetNumber of bytes to move
whenceReference point (0: start of file, 1: current position, 2: end of file)

Example: Reading Data from a Specific Position

with open("example.bin", "rb") as file:
    file.seek(10)  # Move to the 10th byte from the start
    data = file.read(5)  # Read 5 bytes
    print(data)

This method is useful for parsing file headers and processing files with specific data structures.

3.5 Get the Current File Position (tell())

Using the tell() method, you can obtain the current file position (byte offset).

Example

with open("example.bin", "rb") as file:
    file.read(20)  # Read 20 bytes
    position = file.tell()  # Get the current file position
    print(f"Current position: {position} bytes")

Use Cases

  • Verify how much of the file has been read
  • Debugging when processing occurs mid‑file

3.6 Use Memory‑Mapped Files for Fast Reading (mmap)

Using the mmap module, large binary files can be mapped into virtual memory for fast access.

Basic Syntax

import mmap

with open("example.bin", "rb") as file:
    with mmap.mmap(file.fileno(), 0, access=mmap.ACCESS_READ) as mmapped_file:
        print(mmapped_file[:100])  # Get the first 100 bytes

Advantages

  • Allows the entire file to be handled in memory, enabling fast access
  • Directly access specific ranges

Disadvantages

  • More complex than standard Python file handling
  • Mapping excessively large files may cause memory shortage

3.7 Summary

  • Choosing the appropriate method based on file size and use case is essential for efficient binary file reading
  • Small files can be read all at once with read()
  • Large files should be processed in chunks with read(n)
  • Use seek() to retrieve data from a specific file position
  • mmap enables fast data access, but care is needed depending on the use case
Ad

4. How to Parse Binary Data in Python

With Python, you can convert binary data into meaningful data types such as integers, floating-point numbers, and strings. Here, we focus on explaining how to use the struct module, which is useful for parsing binary data.

4.1 Parsing Using the struct Module

The struct module in Python is used to parse and convert binary data like C language structs.

Basic Syntax

import struct
value = struct.unpack(format, binary_data)
Format CharacterMeaningByte Count
iSigned integer4
IUnsigned integer4
fFloat (IEEE 754)4
dDouble (IEEE 754)8
sByte stringSpecified size

Example: Reading Integers

import struct

# Binary data (4 bytes)
binary_data = b'x01x00x00x00'  # 1 (little-endian)

# Unpack (unsigned int, little-endian)
value = struct.unpack('<I', binary_data)[0]
print(value)  # Output: 1

Example: Reading Strings

import struct

# 10-byte binary data
binary_data = b'HelloWorld'

# Interpret as string data
decoded_string = struct.unpack('10s', binary_data)[0].decode('utf-8')
print(decoded_string)  # Output: HelloWorld

Example: Reading Floating-Point Numbers

import struct

# 4-byte float (IEEE 754 format)
binary_data = b'x00x00x80x3f'  # Represents 1.0

# Unpack
value = struct.unpack('<f', binary_data)[0]
print(value)  # Output: 1.0

4.2 Endian Differences (Big Endian and Little Endian)

The interpretation of binary data varies depending on the endianness (byte order).

TypeCharacteristics
Little EndianStored from the least significant byte upward (common on Intel CPUs)
Big EndianStored from the most significant byte upward (commonly used in network communications)

Example: Verifying Endian Differences

import struct

binary_data = b'x01x00x00x00'  # 1 (little-endian)

# Little-endian
little_endian = struct.unpack('<I', binary_data)[0]
print(f"Little Endian: {little_endian}")  # Output: 1

# Big-endian
big_endian = struct.unpack('>I', binary_data)[0]
print(f"Big Endian: {big_endian}")  # Output: 16777216

4.3 Summary

  • Using the struct module, you can convert binary data into integers, floating-point numbers, strings, etc.
  • Be careful with endian specification (< Little Endian, > Big Endian)
  • Since endian varies by file format, incorrect interpretation can lead to wrong values
Ad

5. Writing Binary Files in Python

So far, we have explained the reading method for binary files using Python. Here, we will explain the writing method for binary files. In Python, you can write data to a binary file by using the 'wb' mode.

5.1 Basic Writing Using the open() Function

When you specify the 'wb' mode with the open() function, the file is opened in binary write mode.

Basic Syntax

with open("example.bin", "wb") as file:
    file.write(binary_data)

Key Points

  • When the 'wb' mode is specified, the existing content is overwritten
  • The argument of file.write() must be of type bytes

5.2 Writing Byte Sequences

In Python, binary data is treated as the bytes type.

Example: Writing a Byte Sequence

with open("output.bin", "wb") as file:
    file.write(b'x01x02x03x04')  # Write 4 bytes of data

In this case, output.bin will have the 4-byte 01 02 03 04 written sequentially.

5.3 Writing Binary Data Using struct.pack()

By using struct.pack(), you can convert Python numbers and strings into binary data and save them to a file.

Example: Writing Integers

import struct

# Unsigned 16-bit int (H) and unsigned 32-bit int (I)
binary_data = struct.pack('<HI', 512, 123456789)

with open("numbers.bin", "wb") as file:
    file.write(binary_data)

Example: Writing Strings

import struct

text = "Hello"
binary_data = struct.pack('10s', text.encode('utf-8'))  # 10-byte fixed length

with open("text.bin", "wb") as file:
    file.write(binary_data)

Example: Writing Floating-Point Numbers

import struct

float_value = 3.14
binary_data = struct.pack('<f', float_value)  # 4-byte float

with open("float.bin", "wb") as file:
    file.write(binary_data)

5.4 Binary Writing in Append Mode

If you want to add data to the end of a file, use the 'ab' mode.

Example: Append Mode

with open("output.bin", "ab") as file:
    file.write(b'xffxff')  # Append data

When this code is executed, FF FF is added to the end of the existing output.bin.

5.5 Overwriting Part of a File

If you need to overwrite part of an existing file, use the 'r+b' mode.

Example: Writing Data in the Middle of a File

with open("output.bin", "r+b") as file:
    file.seek(10)  # Move to the 10th byte
    file.write(b'ª»')  # Write 2 bytes of data

In this case, 2 bytes starting at the 10th byte of the file are overwritten with ª».

5.6 Summary

  • When writing binary files in Python, use the 'wb' mode
  • Using struct.pack() allows you to convert numbers and strings to binary format for saving
  • The 'ab' mode enables appending
  • Using the 'r+b' mode lets you overwrite part of an existing file
Ad

6. Practical Examples of Handling Binary Files in Python

So far, we have learned the basic read/write methods for binary files using Python. In this section, we introduce concrete examples of analyzing and processing actual binary files.

6.1 Binary Analysis of PNG Images

What is a PNG file?

PNG (Portable Network Graphics) is a compressed image format, and header information and image data are stored as binary data.

Header Structure of PNG Files

The first 8 bytes of a PNG file are a magic number89 50 4E 47 0D 0A 1A 0A)that indicates the file is in PNG format.

Analyzing PNG Binary Data

with open("example.png", "rb") as file:
    header = file.read(8)  # Get the first 8 bytes
    print("PNG Header:", header)

Example Output

PNG Header: b'x89PNGrnx1an'

By checking this magic number, you can determine that the file is in PNG format.

6.2 Binary Analysis of WAV Audio Files

What is a WAV file?

WAV (Waveform Audio File Format) is an uncompressed audio format that includes information such as sample rate, channel count, and bit depth in its header.

Header Analysis of WAV Files

WAV files use the RIFF format, with meta information stored in the first 44 bytes.

Analyzing WAV Binary Data

import struct

with open("example.wav", "rb") as file:
    header = file.read(44)  # Get the first 44 bytes (WAV header)

    # Parse RIFF header
    riff, size, wave = struct.unpack('<4sI4s', header[:12])

    # Parse format information
    fmt, fmt_size, audio_format, num_channels, sample_rate = struct.unpack('<4sIHHI', header[12:24])

    print(f"RIFF Header: {riff}")
    print(f"Format: {wave}")
    print(f"Audio Format: {audio_format}")
    print(f"Channels: {num_channels}")
    print(f"Sample Rate: {sample_rate} Hz")

Example Output

RIFF Header: b'RIFF'
Format: b'WAVE'
Audio Format: 1
Channels: 2
Sample Rate: 44100 Hz
  • RIFF → Magic number indicating WAV format
  • Channels: 2 → Stereo audio
  • Sample Rate: 44100 Hz → CD-quality audio

6.3 Analysis of Custom Binary Formats

Some files have custom binary formats. In Python, you can analyze them using struct.

Sample Format

Byte countData typeContent
0–3I (4-byte integer)File ID
4–7f (4-byte float)Version
8–1710s (10-byte string)Name

Analyzing Binary Data

import struct

with open("custom_data.bin", "rb") as file:
    data = file.read()

    file_id, version, name = struct.unpack('<If10s', data)

    print(f"File ID: {file_id}")
    print(f"Version: {version}")
    print(f"Name: {name.decode().strip()}")

Example Output

File ID: 12345
Version: 1.2
Name: TestFile

6.4 Summary

  • PNG analysis → Verify format with magic number
  • WAV analysis → Check channel count and sample rate from header
  • Custom format → Can extract numbers and strings with struct.unpack()
Ad

7. Precautions and Best Practices When Handling Binary Files

When handling binary files in Python, there are several considerations such as optimizing performance, preventing data corruption, and ensuring safety. This section summarizes best practices for binary file processing.

7.1 Optimizing Processing of Large Files

Binary files can range from several hundred MB to several GB. It is safer to avoid reading the entire file and process it in chunks.

Bad Example: Reading a Large File All at Once

with open("large_file.bin", "rb") as file:
    data = file.read()  # Load entire file into memory (dangerous)

Recommended Example: Process in Chunks

with open("large_file.bin", "rb") as file:
    while chunk := file.read(4096):  # Read in 4KB chunks
        process(chunk)

7.2 with Statement to Close Files Safely

Forgetting close() can cause resource leaks. Always use the with statement.

Bad Example

file = open("example.bin", "rb")
data = file.read()
# Forgot to close() → leak

Recommended Example

with open("example.bin", "rb") as file:
    data = file.read()
# File is closed automatically

7.3 Correctly Specify Endianness (Byte Order)

Depending on the environment or protocol, it may be little endian or big endian. Failing to specify correctly can corrupt data.

EndiannessFeatures
Little EndianLow-order byte first (e.g., Intel CPUs)
Big EndianHigh-order byte first (e.g., network protocols)

Example: Differences in Endianness

import struct

binary_data = b'x01x00x00x00'

value_le = struct.unpack('<I', binary_data)[0]  # Little Endian
print("Little Endian:", value_le)  # 1

value_be = struct.unpack('>I', binary_data)[0]  # Big Endian
print("Big Endian:", value_be)  # 16777216

7.4 Prevent Errors with Exception Handling

File operations are prone to errors. Be sure to incorporate try-except.

Example: Safe Binary Reading

import struct

try:
    with open("example.bin", "rb") as file:
        binary_data = file.read(4)
        value = struct.unpack('<I', binary_data)[0]
        print(f"Value: {value}")

except FileNotFoundError:
    print("Error: File not found.")
except struct.error:
    print("Error: Invalid binary format.")
except Exception as e:
    print(f"Unexpected error: {e}")

7.5 Debugging Binary Data

Using binascii.hexlify() or Linux’s hexdump allows you to view binary contents.

Hexadecimal Display in Python

import binascii

with open("example.bin", "rb") as file:
    binary_data = file.read(16)
    print(binascii.hexlify(binary_data))

Verification on Linux/macOS

hexdump -C example.bin | head

7.6 Summary

  • Optimize memory efficiency for large files with chunk processing
  • Use the with statement to reliably close files
  • Incorrect endianness can cause data corruption
  • Handle errors safely with exception handling
  • Debuggable with binascii.hexlify() and hexdump
Ad

8. Frequently Asked Questions (FAQ)

We have compiled the points that many people wonder about regarding reading, writing, and analyzing binary files in an FAQ format. We explain the common problems and solutions for handling binary data with Python.

Q1: What is the difference between a text file and a binary file?

ItemText fileBinary file
Storage formatCharacter encoding (UTF-8, Shift-JIS, etc.)Byte sequence of 0s and 1s
Example extensions.txt, .csv, .json.jpg, .png, .mp3, .bin
Editing methodCan be edited directly with an editorRequires specialized programs or a binary editor
Use casesSource code, configuration filesImages, audio, video, executable files

Q2: When opening a binary file in Python, what is the difference between 'rb' and 'r'?

ModeDescription
'r'Text mode. Newline characters are automatically converted.
'rb'Binary mode. Newline characters are not converted and are handled as-is.

Q3: I don’t understand how to use the struct module. How do I use it?

struct module is used to convert binary data to numbers or strings (unpack), or conversely to binary-ify (pack).

Example: Convert binary data to an integer

import struct

binary_data = b'x01x00x00x00'
value = struct.unpack('<I', binary_data)[0]  # Little Endian unsigned int
print(value)  # 1

Q4: How to convert binary data to text?

Hexadecimal representation improves readability when converted.

import binascii

with open("example.bin", "rb") as file:
    binary_data = file.read()
    hex_data = binascii.hexlify(binary_data)
    print(hex_data)

Q5: What is endianness?

Endianness refers to the order of bytes.

TypeDescription
Little EndianLeast significant byte first (e.g., Intel CPUs)
Big EndianMost significant byte first (e.g., network communication)

Q6: How to efficiently process large binary files?

The chunked (partial block) reading method is recommended.

with open("large_file.bin", "rb") as file:
    while chunk := file.read(4096):  # Read in 4KB chunks
        process(chunk)

Q7: How to debug binary data?

Use Python’s binascii.hexlify() and Linux’s hexdump commands.

Check with Python

import binascii

with open("example.bin", "rb") as file:
    binary_data = file.read(16)
    print(binascii.hexlify(binary_data))

Check with Linux/macOS

hexdump -C example.bin | head

Summary

  • Open binary with 'rb' mode
  • Use struct to pack/unpack for analysis or writing
  • Specifying the correct endianness is important
  • Large files are optimized with chunk processing
  • Debuggable with binascii.hexlify() and hexdump

Conclusion

This completes the comprehensive guide to reading, writing, and analyzing binary files with Python. In the future, apply it to real projects and use it for more advanced processing 🚀

Ad