Python supports not only text files but also reading and writing binary files. By working with binary files, you can manipulate various types of data such as images, audio, video, and compressed files. This article explains how to safely and efficiently read binary files using Python.
1.1 What Is a Binary File?
A binary file is a file composed of binary data (a combination of 0s and 1s) that computers can understand, instead of human-readable text strings. Examples of binary files include:
Image files (PNG, JPEG, BMP, etc.)
Audio files (WAV, MP3, AAC, etc.)
Video files (MP4, AVI, MOV, etc.)
Compressed files (ZIP, RAR, GZ, etc.)
Executable files (EXE, DLL, BIN, etc.)
When opened in a normal text editor, binary files often appear as “garbled text.” This happens because the data is encoded in a specific format, and meaningful information can only be extracted using an appropriate program.
1.2 Difference Between Text and Binary Files
The main difference between binary and text files lies in their data storage method.
Type
Data Format
Examples
Text File
Stored with character encoding (UTF-8, Shift-JIS, etc.)
.txt, .csv, .json
Binary File
Stored as sequences of 0s and 1s
.jpg, .mp3, .exe
Main Differences
Data Structure
Text files contain only data interpretable as characters.
Binary files can contain any type of data (images, audio, executable code, etc.).
Size
Text files are usually smaller when storing small amounts of data.
Binary files may be larger due to encoding overhead.
Editing Method
Text files can be opened and edited directly with editors like Notepad or VS Code.
Binary files require special tools (e.g., binary editors) for editing.
1.3 Why Handle Binary Files in Python?
Common reasons for handling binary files with Python include:
① Processing Images and Audio Data
By reading binary files, Python programs can analyze images or process audio data.
# Example: Read a PNG image as binary
with open("image.png", "rb") as file:
binary_data = file.read()
print(binary_data[:20]) # Display the first 20 bytes
② Analyzing Compressed Data
Python provides modules like zipfile and gzip to extract or compress ZIP/GZ files programmatically.
import gzip
# Example: Open a GZ compressed file
with gzip.open("example.gz", "rb") as file:
content = file.read()
print(content)
③ Parsing Binary Protocols
In low-level operations such as network communication or databases, binary data parsing is essential. The struct module allows you to convert binary data into numbers or strings.
import struct
# Example: Convert binary data to integer
binary_data = b' ' # 4 bytes of data
integer_value = struct.unpack('<I', binary_data)[0]
print(integer_value) # Output: 1
1.4 Summary
Binary files store information such as images, audio, and compressed data.
Unlike text files, data is stored as sequences of 0s and 1s.
Python enables parsing, processing, and converting binary data.
Using Python’s open() function and struct module, binary files can be handled efficiently.
2. How to Read Binary Files in Python and Basic Operations
In Python, you can open and read binary files using the open() function. This section explains the basic operations for working with binary files in Python.
2.1 Reading Binary Files with the open() Function
The open() function in Python is the fundamental way to open files. To open a binary file, you must specify 'rb' (read-only binary mode).
Basic Syntax
file = open("example.bin", "rb") # 'rb' means "read in binary mode"
binary_data = file.read() # Read the file content
file.close() # Close the file
However, with this method, if you forget to explicitly call close(), the file may not be closed, causing a resource leak. Therefore, in Python it is common practice to use the with statement for safer file handling.
2.2 Safe Binary Reading with the with Statement
The with statement automatically closes the file when done, ensuring that resources are properly released even if an error occurs.
Example: Safe Reading of Binary Data
with open("example.bin", "rb") as file:
binary_data = file.read()
# Once the with-block ends, the file is automatically closed
Benefits of Using the with Statement
No need to call file.close() (it closes automatically)
No resource leaks even if an error occurs
Code becomes simpler and more readable
2.3 Variations of Reading Methods
Python provides several ways to read binary files. Choose the appropriate method depending on the use case.
① Read the Entire File at Once (read())
This method loads the entire file content into memory.
with open("example.bin", "rb") as file:
binary_data = file.read() # Read the entire file at once
Advantages
Simple and easy to understand
Efficient for small files
Disadvantages
For large files (hundreds of MB to GB), memory usage may become excessive
② Read in Fixed-Size Chunks (read(n))
This method reads the file in partial chunks, which is ideal for large files.
with open("example.bin", "rb") as file:
chunk = file.read(1024) # Read 1024 bytes (1KB) at a time
while chunk:
print(chunk) # Process the chunk
chunk = file.read(1024) # Read the next 1KB
Advantages
Reduces memory usage
Efficient for handling large files
Disadvantages
Not suitable if you need the entire file loaded at once
③ Read Binary Data Line by Line (readline())
If the binary file contains newline characters, it can be read line by line.
with open("example.bin", "rb") as file:
line = file.readline() # Read one line
while line:
print(line)
line = file.readline() # Read the next line
Use Case
Binary log files or binary data that includes newlines
Note
If no newline exists, the entire content is treated as one line, so this method is valid only for specific files
2.4 File Positioning with seek()
The seek() method allows you to read data from any position in the file.
Basic Syntax of seek()
file.seek(offset, whence)
Argument
Description
offset
Number of bytes to move
whence
Reference point (0: beginning of file, 1: current position, 2: end of file)
Example: Reading Data from the Middle of a File
with open("example.bin", "rb") as file:
file.seek(10) # Move to the 10th byte from the beginning
data = file.read(5) # Read 5 bytes
print(data)
Use Cases
Extract header information
Analyze specific sections of data
2.5 Getting the Current File Position with tell()
The tell() method returns the current file position (byte offset).
Example: Checking File Position
with open("example.bin", "rb") as file:
file.read(10) # Read 10 bytes
position = file.tell() # Get the current position
print(f"Current file position: {position} bytes")
Use Cases
Check how much of the file has been read
Debugging when processing files mid-way
2.6 Summary
Use 'rb' mode when opening binary files in Python
The with statement ensures files are closed safely
read() loads the entire file into memory, which may be inefficient for large files — use read(n) instead
seek() allows moving to any file position, and tell() retrieves the current position
3. Efficient Methods for Reading Binary Files in Python
In the previous section, we explained the basic ways to open binary files. In this section, we will go into detail about efficient ways of reading binary files. Python offers multiple methods to read binary data, and choosing the right approach depends on the use case.
3.1 Reading the Entire Binary File at Once (read())
To read a binary file all at once, use the read() method.
Basic Syntax
with open("example.bin", "rb") as file:
binary_data = file.read()
Advantages
Simple and intuitive
Suitable for small files (a few MB or less)
Disadvantages
Consumes a lot of memory if the file is large (hundreds of MB or more)
May cause crashes if the file size exceeds available memory
Example
with open("sample.bin", "rb") as file:
binary_data = file.read()
print(len(binary_data)) # Display file size in bytes
This method is safe for files that are a few MB in size.
3.2 Reading in Fixed-Size Chunks (read(n))
When handling large binary files, it is recommended to read the file in fixed-size chunks to optimize memory usage.
Basic Syntax
with open("example.bin", "rb") as file:
chunk = file.read(1024) # Read 1024 bytes (1KB) at a time
while chunk:
print(chunk)
chunk = file.read(1024) # Read the next 1KB
Advantages
Reduces memory usage when processing large files
Supports streaming-like processing
Disadvantages
Additional processing required for real-time handling
Example
with open("large_file.bin", "rb") as file:
while True:
chunk = file.read(4096) # Read 4KB at a time
if not chunk:
break # End when no more data
print(f"Read {len(chunk)} bytes")
This method allows you to process files in the GB range without exhausting memory.
3.3 Reading Binary Data Line by Line (readline())
If the binary data contains newline characters, it can be read line by line.
Basic Syntax
with open("example.bin", "rb") as file:
line = file.readline()
while line:
print(line)
line = file.readline()
Use Cases
Binary log files or binary data that includes newline characters
Notes
If no newlines exist, the entire file will be treated as one line
Not commonly used in general binary processing
3.4 Reading Data from Specific File Positions (seek())
When analyzing binary files, you may want to read data starting from a specific file position. In such cases, use the seek() method.
Basic Syntax
file.seek(offset, whence)
Argument
Description
offset
Number of bytes to move
whence
Reference point (0: beginning of file, 1: current position, 2: end of file)
Example: Reading Data from a Specific Position
with open("example.bin", "rb") as file:
file.seek(10) # Move to the 10th byte from the beginning
data = file.read(5) # Read 5 bytes
print(data)
This method is useful for parsing file headers or processing files with structured data.
3.5 Getting the Current File Position (tell())
The tell() method returns the current file position (byte offset).
Example
with open("example.bin", "rb") as file:
file.read(20) # Read 20 bytes
position = file.tell() # Get the current position
print(f"Current position: {position} bytes")
Use Cases
Check how much of the file has been read
Debug processing within the file
3.6 Using Memory-Mapped Files for Faster Reading (mmap)
The mmap module allows you to map large binary files into virtual memory for faster access.
Basic Syntax
import mmap
with open("example.bin", "rb") as file:
with mmap.mmap(file.fileno(), 0, access=mmap.ACCESS_READ) as mmapped_file:
print(mmapped_file[:100]) # Get the first 100 bytes
Advantages
Faster access since the file is handled in memory
Allows direct access to specific ranges
Disadvantages
More complex than standard file handling
May cause memory issues if the file is extremely large
3.7 Summary
Efficient reading of binary files requires choosing methods based on file size and purpose
Use read() for small files
Use read(n) for large files to process them in chunks
Use seek() to retrieve data from specific positions
Use mmap for fast access, but apply carefully depending on the case
4. How to Analyze Read Binary Data
After reading a binary file in Python, it is important to analyze the data properly. Since binary data is usually stored in formats such as integers, strings, or floating-point numbers, understanding the data structure and applying the right method for analysis is essential. In this section, we will explain how to analyze binary data using Python’s struct module.
4.1 What Is the struct Module?
The struct module in Python is a standard library for encoding and decoding binary data with specified formats.
Main Features
Convert binary data into integers, floats, and strings (unpacking)
Convert values into binary format (packing)
Specify endianness (byte order)
Basic Syntax of the struct Module
import struct
# Convert binary data into specific types (unpack)
data = struct.unpack(format, binary_data)
# Convert values into binary data (pack)
binary_data = struct.pack(format, value1, value2, ...)
4.2 Unpacking (Decoding) Binary Data
To interpret binary data read from a file as Python values (integers, strings, etc.), use struct.unpack().
Example: Decode a 4-Byte Integer (32-bit)
import struct
# Binary data (4 bytes)
binary_data = b' ' # Represents 1 (Little Endian)
# Unpack as an unsigned 32-bit integer in Little Endian
value = struct.unpack('<I', binary_data)[0]
print(value) # Output: 1
Format Specifier
Type
Bytes
b
Signed 8-bit integer (char)
1
B
Unsigned 8-bit integer (uchar)
1
h
Signed 16-bit integer (short)
2
H
Unsigned 16-bit integer (ushort)
2
i
Signed 32-bit integer (int)
4
I
Unsigned 32-bit integer (uint)
4
f
Floating-point (float)
4
d
Floating-point (double)
8
4.3 Parsing String Data
When a binary file contains string data, you can use struct.unpack() to decode it and then apply the appropriate character encoding.
The struct module allows parsing binary data into numbers and strings
Understanding endianness is essential for accurate decoding
You can efficiently parse file headers and structured data by unpacking multiple data types together
5. How to Create and Write Binary Files in Python
In the previous section, we explained how to analyze binary data in Python. In this section, we will cover how to create binary data and write it to a binary file using Python.
5.1 Writing Files in Binary Mode
To create a binary file in Python, use the 'wb' (write-only binary mode) with the open() function.
Basic Syntax
with open("example.bin", "wb") as file:
file.write(binary_data)
'wb' means “write in binary mode”.
Use the write() method to write binary data into a file.
Simple Example: Writing Binary Data
with open("output.bin", "wb") as file:
file.write(b'') # Write 4 bytes of data
Key Points
b'' represents 4 bytes of binary data (in hexadecimal format).
The actual file size will be 4 bytes.
5.2 Creating Binary Data with struct.pack()
As the reverse of struct.unpack(), Python’s struct.pack() is used to convert data into binary format before writing.
Binary files can also store fixed-length string data.
Example: Writing a 10-Byte String
import struct
text = "Hello"
binary_data = struct.pack('10s', text.encode('utf-8')) # Fixed length: 10 bytes
with open("text.bin", "wb") as file:
file.write(binary_data)
Key Points
'10s' means a fixed-length string of 10 bytes.
Convert to binary with encode('utf-8').
If the string is shorter than 10 bytes, padding (null bytes) is added automatically.
5.5 Writing Floating-Point Numbers to a Binary File
Floating-point numbers can also be stored with struct.pack().
Example: Write a Float
import struct
float_value = 3.14
binary_data = struct.pack('<f', float_value) # 4-byte float
with open("float.bin", "wb") as file:
file.write(binary_data)
Explanation
<f means a Little Endian 4-byte floating-point number.
The IEEE 754 representation of 3.14 is 0xC3F54840.
5.6 Appending to a Binary File ('ab' Mode)
To add data to an existing binary file, use 'ab' (append mode) instead of 'wb'.
Example: Append Data to a Binary File
with open("output.bin", "ab") as file:
file.write(b'ÿÿ') # Write additional data
Key Points
'ab' means “append in binary mode”.
Data is added at the end without overwriting existing content.
5.7 Writing Data to Specific Positions with seek()
It is also possible to write data at a specific position in a file.
Example: Write Data at the 10th Byte from the Beginning
with open("output.bin", "r+b") as file:
file.seek(10) # Move to the 10th byte
file.write(b'ª»') # Write 2 bytes
Key Points
'r+b' mode means “read/write in binary mode”.
seek(10) moves to the 10th byte from the beginning before writing.
5.8 Summary
Use 'wb' mode in Python to write binary files
struct.pack() converts numbers, strings, and floats into binary
'ab' (append mode) allows adding data to existing binary files
seek() lets you write data at specific positions in a file
6. Practical Examples of Handling Binary Files in Python
So far, we have covered the basics of reading and writing binary files. In this section, we will introduce practical examples using Python to handle binary files.
6.1 Reading and Writing Image Files
Binary files are often used for image processing. Here is how to read and copy a binary image file.
Example: Copy an Image File
# Read binary image data
with open("input.png", "rb") as infile:
image_data = infile.read()
# Write binary image data to another file
with open("copy.png", "wb") as outfile:
outfile.write(image_data)
Key Points
The image data is read in binary and written as-is to another file.
This ensures that the image is copied without corruption.
6.2 Reading and Writing Audio Files
Binary files are also used in audio data processing. For example, WAV files can be read and processed in Python.
Example: Read the Header of a WAV File
with open("sample.wav", "rb") as file:
header = file.read(44) # The header of a WAV file is 44 bytes
print(header)
Explanation
By reading the first 44 bytes, you can obtain WAV file metadata such as sample rate and channel count.
6.3 Handling Compressed Files (ZIP/GZ)
Python provides zipfile and gzip modules to process compressed binary files.
Example: Extract a ZIP File
import zipfile
with zipfile.ZipFile("archive.zip", "r") as zip_ref:
zip_ref.extractall("extracted") # Extract contents into the "extracted" folder
Example: Read a GZ Compressed File
import gzip
with gzip.open("compressed.gz", "rb") as file:
content = file.read()
print(content[:50]) # Display the first 50 bytes
6.4 Reading and Writing Executable Files
Binary files also include executable files (.exe, .bin, etc.). These can be read in binary mode and written to another file.
Example: Copy an Executable File
with open("program.exe", "rb") as infile:
exe_data = infile.read()
with open("copy.exe", "wb") as outfile:
outfile.write(exe_data)
Notes
When handling executable files, never attempt to edit them without specialized knowledge.
Improper modifications may cause the file to become corrupted or insecure.
6.5 Using struct to Parse Binary Data Structures
When binary data contains structured values, use the struct module to parse them.
Example: Parse a Custom Binary Format
Suppose you have binary data in the following format:
Binary files are widely used for images, audio, video, compressed data, and executables.
Python’s standard library supports binary file processing with modules such as zipfile, gzip, and struct.
By combining reading, writing, and parsing, you can handle various binary file formats.
7. Common Pitfalls and Best Practices When Handling Binary Files
When working with binary files in Python, there are several pitfalls and precautions to be aware of. Here we will summarize common mistakes and best practices.
7.1 Common Pitfalls
Opening Files Without Specifying Binary Mode
If you open a binary file without using 'rb' or 'wb', Python may attempt to interpret it as text, potentially causing corruption or errors.
Memory Overuse When Reading Large Files
Using read() loads the entire file into memory at once. For files of several hundred MB or GB, this may cause MemoryError.
Ignoring Endianness (Byte Order)
If you decode binary data without specifying the correct endianness, values may be misinterpreted.
Editing Executable Files Without Knowledge
Improper editing of binary executables (EXE, DLL, etc.) may corrupt the file or cause security issues.
Incorrect String Encoding
Binary strings must be decoded with the correct .decode('utf-8') or other encoding. Using the wrong encoding may cause garbled text or errors.
7.2 Best Practices
Always Use the with Statement
The with statement ensures files are automatically closed, preventing resource leaks.
Use read(n) for Large Files
For large files, read them in chunks to reduce memory usage.
Specify Endianness Explicitly
When using struct.unpack(), always specify < (Little Endian) or > (Big Endian).
Validate File Format Before Processing
Check headers or magic numbers to ensure the file type is correct before reading.
Use Appropriate Libraries
For images: PIL/Pillow
For audio: wave, pydub
For compressed files: zipfile, gzip
Use Memory-Mapped Files (mmap) for Performance
Efficiently process large files without loading the entire file into memory.
7.3 Summary
Specify binary mode ('rb', 'wb') when opening binary files.
Use with for safe file handling.
Read in chunks for large files.
Be aware of endianness when decoding.
Use specialized libraries for each file format.
8. Conclusion
In this article, we explained how to read, write, and analyze binary files in Python.
Binary files are essential when handling images, audio, video, compressed data, or custom data formats, and mastering them greatly expands the scope of Python applications.
Key Takeaways
Binary files store data as sequences of bytes rather than human-readable text.
Use 'rb', 'wb', 'ab' modes to properly read, write, or append binary data.
The struct module allows converting binary data into integers, floating-point numbers, and strings.
Endianness (byte order) must be correctly specified to ensure accurate data interpretation.
Large binary files should be processed in chunks or via mmap to optimize memory usage.
Best practices: always use the with statement, handle exceptions, and use specialized libraries depending on the file type.
Final Thoughts
By mastering binary file handling in Python, you can work with a wide range of data formats—from image and audio processing to protocol analysis and custom data formats.
This is a crucial skill for anyone aiming to advance in data engineering, scientific computing, or systems programming. Now that you have learned the theory and practical techniques, try applying them to your own projects, research, and development. 🚀