- 1 1. Introduction
- 2 2. How to Read Binary Files in Python and Basic Operations
- 3 3. Efficient Binary File Reading Using Python
- 3.1 3.1 Read the Entire Binary File at Once (read())
- 3.2 3.2 Read in Chunks of Specified Byte Sizes (read(n))
- 3.3 3.3 Read Binary Data Line by Line (readline())
- 3.4 3.4 Read Data from a Specific Position in a File (using seek())
- 3.5 3.5 Get the Current File Position (tell())
- 3.6 3.6 Use Memory‑Mapped Files for Fast Reading (mmap)
- 3.7 3.7 Summary
- 4 4. How to Parse Binary Data in Python
- 5 5. Writing Binary Files in Python
- 6 6. Practical Examples of Handling Binary Files in Python
- 7 7. Precautions and Best Practices When Handling Binary Files
- 8 8. Frequently Asked Questions (FAQ)
- 8.1 Q1: What is the difference between a text file and a binary file?
- 8.2 Q2: When opening a binary file in Python, what is the difference between 'rb' and 'r'?
- 8.3 Q3: I don’t understand how to use the struct module. How do I use it?
- 8.4 Q4: How to convert binary data to text?
- 8.5 Q5: What is endianness?
- 8.6 Q6: How to efficiently process large binary files?
- 8.7 Q7: How to debug binary data?
- 8.8 Summary
- 8.9 Conclusion
1. Introduction
Python supports not only text files but also reading and writing binary files. By handling binary files, you can manipulate various data such as images, audio, video, and compressed files. In this article, we will explain how to read binary files safely and efficiently using Python.
1.1 What is a Binary File?
A binary file is a file that consists of binary data (a combination of 0s and 1s) rather than human‑readable strings. Representative examples of binary files include the following:
- Image files (PNG, JPEG, BMP, etc.)
- Audio files (WAV, MP3, AAC, etc.)
- Video files (MP4, AVI, MOV, etc.)
- Compressed files (ZIP, RAR, GZ, etc.)
- Executable program files (EXE, DLL, BIN, etc.)
Binary files usually appear as garbled characters when opened with a regular text editor. This is because the data is encoded in a specific format and does not become meaningful information unless parsed with an appropriate program.
1.2 Differences from Text Files
The major difference between binary files and text files is their data storage method.
| Type | Data content | Example |
|---|---|---|
| Text file | Saved using character encoding (UTF-8, Shift-JIS, etc.) | .txt, .csv, .json |
| Binary file | Saved as a byte sequence of 0s and 1s | .jpg, .mp3, .exe |
Main Differences
- Data structure
- Text files contain only data that can be interpreted as characters.
- Binary files contain all kinds of data (images, audio, executable code, etc.).
- Size
- If the data to be stored is small, text files tend to have a smaller file size.
- Binary files can be larger due to encoding overhead even with the same content.
- Editing method
- Text files can be opened and edited directly with editors such as
NotepadorVS Code. - Binary files cannot be edited without a specialized program (e.g., a hex editor).
1.3 Why Use Python for Binary Files
Reasons to manipulate binary files with Python include the following:
① Processing Image and Audio Data
By reading binary files, Python programs can analyze images and process audio data.
# Example: Read a PNG image file as binary
with open("image.png", "rb") as file:
binary_data = file.read()
print(binary_data[:20]) # Display the first 20 bytes② Analyzing Compressed Data
Python includes modules such as zipfile and gzip, which allow you to decompress and compress ZIP or GZ files programmatically.
import gzip
# Example: Open a GZ-compressed file
with gzip.open("example.gz", "rb") as file:
content = file.read()
print(content)③ Analyzing Binary Protocols
In low‑level network communication or database operations, analyzing binary data is required. Using the struct module, you can convert binary data into numbers or strings.
import struct
# Example: Convert binary data to an integer
binary_data = b'x01x00x00x00' # 4-byte data
integer_value = struct.unpack('<I', binary_data)[0]
print(integer_value) # Output: 11.4 Summary
- Binary files are file formats that store information such as images, audio, and compressed data.
- Unlike text files, data is stored as a byte sequence of 0s and 1s.
- Using Python enables analysis, processing, and conversion of binary data.
- By using Python’s
open()function andstructmodule, you can handle binary files efficiently.
2. How to Read Binary Files in Python and Basic Operations
In Python, you can open and read binary files using the open() function. This section explains the basic operations for binary files in Python.
2.1 Reading Binary Files Using the open() Function
Python’s open() function is the basic function for opening files. When opening a binary file, specify 'rb' (read-only binary mode).
Basic Syntax
file = open("example.bin", "rb") # 'rb' means "read in binary mode"
binary_data = file.read() # Read the contents of the file
file.close() # Close the fileHowever, with this method, if you do not explicitly call close(), the file will not be closed, which can cause resource leaks. Therefore, in Python it is common to use the with statement to open files safely.
2.2 Safe Binary File Reading Using the with Statement
Using the with statement automatically closes the file, so resources are properly released even if an error occurs.
Example: Safe Reading of Binary Data
with open("example.bin", "rb") as file:
binary_data = file.read()
# When leaving the with-block, the file is automatically closedBenefits of Using the with Statement
- No need to call
file.close()(automatically closed) - Does not leak resources even if an error occurs
- Code is simpler and more readable
2.3 Variations of Reading Methods
Python provides several methods for reading binary files. Choose the appropriate method according to your use case.
① Read All Data at Once (read())
This method loads the entire binary file content into memory.
with open("example.bin", "rb") as file:
binary_data = file.read() # Read all data at onceAdvantages
- Simple and easy to understand
- Efficient for small files
Disadvantages
- For large files (hundreds of MB to GB), it may consume a lot of memory
② Read Specified Number of Bytes at a Time (read(n))
Reading the file in partial chunks is suitable for processing large files.
with open("example.bin", "rb") as file:
chunk = file.read(1024) # Read 1024 bytes (1KB) at a time
while chunk:
print(chunk) # Process the read data
chunk = file.read(1024) # Read the next 1024 bytesAdvantages
- Reduces memory consumption
- Can process large files efficiently
Disadvantages
- Not suitable for use cases that require processing the entire file at once
③ Read Binary Data Line by Line (readline())
If the binary data contains newline characters, it can be read line by line.
with open("example.bin", "rb") as file:
line = file.readline() # Read one line at a time
while line:
print(line)
line = file.readline() # Read the next lineUse Cases
- Binary log files and other binary data that contain newlines
Cautions
- If there are no newlines, the entire content is considered a single line, so this method is only effective for appropriate files
2.4 File Position Manipulation Using seek()
Using seek() allows you to read data from any position in the file.
Basic Syntax of seek()
file.seek(offset, whence)| 引数 | 説明 |
|---|---|
offset | Number of bytes to move |
whence | Reference point@PH-VOID-41@@: start of file, 1: current position, 2: end of file) |
Example: Reading Data from the Middle of a File
with open("example.bin", "rb") as file:
file.seek(10) # Move to the 10th byte from the start
data = file.read(5) # Read 5 bytes
print(data)Use Cases
- Retrieve file header information
- Analyze specific parts of the data
2.5 Get Current File Position with tell()
Using the tell() method allows you to obtain the current file position (byte offset).
Example: Checking File Position
with open("example.bin", "rb") as file:
file.read(10) # Read 10 bytes
position = file.tell() # Get the current file position
print(f"Current position: {position} bytes")Use Cases
- Verify how much of the file has been read
- Debug when processing occurs in the middle of a file
2.6 Summary
- When opening a binary file in Python, use the
'rb'mode - Using the
withstatement allows you to close files safely - Reading all data at once (
read()) consumes a lot of memory, so for large files useread(n) - Use
seek()to move to any position in the file, andtell()to obtain the current file position
3. Efficient Binary File Reading Using Python
In the previous section, we covered the basic ways to open binary files. In this section, we will discuss methods for efficiently reading binary files in detail. Python offers various ways to read binary data, and selecting the appropriate technique based on the use case is important.
3.1 Read the Entire Binary File at Once (read())
To read a binary file all at once, use the read() method.
Basic Syntax
with open("example.bin", "rb") as file:
binary_data = file.read()Advantages
- Simple and intuitive
- Suitable for small files (a few MB or less)
Disadvantages
- Consumes a lot of memory for large files (hundreds of MB or more)
- If the data does not fit in memory, the program may crash
Example
with open("sample.bin", "rb") as file:
binary_data = file.read()
print(len(binary_data)) # Display file size (in bytes)This method works fine for files of a few MB.
3.2 Read in Chunks of Specified Byte Sizes (read(n))
When handling large binary files, the approach of reading in fixed-size byte chunks to consider memory efficiency is recommended.
Basic Syntax
with open("example.bin", "rb") as file:
chunk = file.read(1024) # Read 1024 bytes (1KB) at a time
while chunk:
print(chunk)
chunk = file.read(1024) # Read the next 1024 bytesAdvantages
- Reduces memory load when handling large files
- Enables stream processing
Disadvantages
- Additional handling is required for real-time data processing
Example
with open("large_file.bin", "rb") as file:
while True:
chunk = file.read(4096) # Read 4KB at a time
if not chunk:
break # Stop when no more data
print(f"Read {len(chunk)} bytes")Using this method, you can process GB-sized files without straining memory.
3.3 Read Binary Data Line by Line (readline())
If the binary data contains newline characters, it can be read line by line.
Basic Syntax
with open("example.bin", "rb") as file:
line = file.readline()
while line:
print(line)
line = file.readline()Use Cases
- Binary log files and other binary data that includes newlines
Points to Note
- If the binary file has no newlines, the entire content is treated as a single line
- Not commonly used for typical binary data processing
3.4 Read Data from a Specific Position in a File (using seek())
When analyzing a binary file, there are cases where you want to read data from a specific position. In such cases, use the seek() method.
Basic Syntax
file.seek(offset, whence)| 引数 | 説明 |
|---|---|
offset | Number of bytes to move |
whence | Reference point (0: start of file, 1: current position, 2: end of file) |
Example: Reading Data from a Specific Position
with open("example.bin", "rb") as file:
file.seek(10) # Move to the 10th byte from the start
data = file.read(5) # Read 5 bytes
print(data)This method is useful for parsing file headers and processing files with specific data structures.
3.5 Get the Current File Position (tell())
Using the tell() method, you can obtain the current file position (byte offset).
Example
with open("example.bin", "rb") as file:
file.read(20) # Read 20 bytes
position = file.tell() # Get the current file position
print(f"Current position: {position} bytes")Use Cases
- Verify how much of the file has been read
- Debugging when processing occurs mid‑file
3.6 Use Memory‑Mapped Files for Fast Reading (mmap)
Using the mmap module, large binary files can be mapped into virtual memory for fast access.
Basic Syntax
import mmap
with open("example.bin", "rb") as file:
with mmap.mmap(file.fileno(), 0, access=mmap.ACCESS_READ) as mmapped_file:
print(mmapped_file[:100]) # Get the first 100 bytesAdvantages
- Allows the entire file to be handled in memory, enabling fast access
- Directly access specific ranges
Disadvantages
- More complex than standard Python file handling
- Mapping excessively large files may cause memory shortage
3.7 Summary
- Choosing the appropriate method based on file size and use case is essential for efficient binary file reading
- Small files can be read all at once with
read() - Large files should be processed in chunks with
read(n) - Use
seek()to retrieve data from a specific file position mmapenables fast data access, but care is needed depending on the use case

4. How to Parse Binary Data in Python
With Python, you can convert binary data into meaningful data types such as integers, floating-point numbers, and strings. Here, we focus on explaining how to use the struct module, which is useful for parsing binary data.
4.1 Parsing Using the struct Module
The struct module in Python is used to parse and convert binary data like C language structs.
Basic Syntax
import struct
value = struct.unpack(format, binary_data)| Format Character | Meaning | Byte Count |
|---|---|---|
i | Signed integer | 4 |
I | Unsigned integer | 4 |
f | Float (IEEE 754) | 4 |
d | Double (IEEE 754) | 8 |
s | Byte string | Specified size |
Example: Reading Integers
import struct
# Binary data (4 bytes)
binary_data = b'x01x00x00x00' # 1 (little-endian)
# Unpack (unsigned int, little-endian)
value = struct.unpack('<I', binary_data)[0]
print(value) # Output: 1Example: Reading Strings
import struct
# 10-byte binary data
binary_data = b'HelloWorld'
# Interpret as string data
decoded_string = struct.unpack('10s', binary_data)[0].decode('utf-8')
print(decoded_string) # Output: HelloWorldExample: Reading Floating-Point Numbers
import struct
# 4-byte float (IEEE 754 format)
binary_data = b'x00x00x80x3f' # Represents 1.0
# Unpack
value = struct.unpack('<f', binary_data)[0]
print(value) # Output: 1.04.2 Endian Differences (Big Endian and Little Endian)
The interpretation of binary data varies depending on the endianness (byte order).
| Type | Characteristics |
|---|---|
| Little Endian | Stored from the least significant byte upward (common on Intel CPUs) |
| Big Endian | Stored from the most significant byte upward (commonly used in network communications) |
Example: Verifying Endian Differences
import struct
binary_data = b'x01x00x00x00' # 1 (little-endian)
# Little-endian
little_endian = struct.unpack('<I', binary_data)[0]
print(f"Little Endian: {little_endian}") # Output: 1
# Big-endian
big_endian = struct.unpack('>I', binary_data)[0]
print(f"Big Endian: {big_endian}") # Output: 167772164.3 Summary
- Using the
structmodule, you can convert binary data into integers, floating-point numbers, strings, etc. - Be careful with endian specification (
<Little Endian,>Big Endian) - Since endian varies by file format, incorrect interpretation can lead to wrong values
5. Writing Binary Files in Python
So far, we have explained the reading method for binary files using Python. Here, we will explain the writing method for binary files. In Python, you can write data to a binary file by using the 'wb' mode.
5.1 Basic Writing Using the open() Function
When you specify the 'wb' mode with the open() function, the file is opened in binary write mode.
Basic Syntax
with open("example.bin", "wb") as file:
file.write(binary_data)Key Points
- When the
'wb'mode is specified, the existing content is overwritten - The argument of
file.write()must be of typebytes
5.2 Writing Byte Sequences
In Python, binary data is treated as the bytes type.
Example: Writing a Byte Sequence
with open("output.bin", "wb") as file:
file.write(b'x01x02x03x04') # Write 4 bytes of dataIn this case, output.bin will have the 4-byte 01 02 03 04 written sequentially.
5.3 Writing Binary Data Using struct.pack()
By using struct.pack(), you can convert Python numbers and strings into binary data and save them to a file.
Example: Writing Integers
import struct
# Unsigned 16-bit int (H) and unsigned 32-bit int (I)
binary_data = struct.pack('<HI', 512, 123456789)
with open("numbers.bin", "wb") as file:
file.write(binary_data)Example: Writing Strings
import struct
text = "Hello"
binary_data = struct.pack('10s', text.encode('utf-8')) # 10-byte fixed length
with open("text.bin", "wb") as file:
file.write(binary_data)Example: Writing Floating-Point Numbers
import struct
float_value = 3.14
binary_data = struct.pack('<f', float_value) # 4-byte float
with open("float.bin", "wb") as file:
file.write(binary_data)5.4 Binary Writing in Append Mode
If you want to add data to the end of a file, use the 'ab' mode.
Example: Append Mode
with open("output.bin", "ab") as file:
file.write(b'xffxff') # Append dataWhen this code is executed, FF FF is added to the end of the existing output.bin.
5.5 Overwriting Part of a File
If you need to overwrite part of an existing file, use the 'r+b' mode.
Example: Writing Data in the Middle of a File
with open("output.bin", "r+b") as file:
file.seek(10) # Move to the 10th byte
file.write(b'ª»') # Write 2 bytes of dataIn this case, 2 bytes starting at the 10th byte of the file are overwritten with ª».
5.6 Summary
- When writing binary files in Python, use the
'wb'mode - Using
struct.pack()allows you to convert numbers and strings to binary format for saving - The
'ab'mode enables appending - Using the
'r+b'mode lets you overwrite part of an existing file
6. Practical Examples of Handling Binary Files in Python
So far, we have learned the basic read/write methods for binary files using Python. In this section, we introduce concrete examples of analyzing and processing actual binary files.
6.1 Binary Analysis of PNG Images
What is a PNG file?
PNG (Portable Network Graphics) is a compressed image format, and header information and image data are stored as binary data.
Header Structure of PNG Files
The first 8 bytes of a PNG file are a magic number(89 50 4E 47 0D 0A 1A 0A)that indicates the file is in PNG format.
Analyzing PNG Binary Data
with open("example.png", "rb") as file:
header = file.read(8) # Get the first 8 bytes
print("PNG Header:", header)Example Output
PNG Header: b'x89PNGrnx1an'By checking this magic number, you can determine that the file is in PNG format.
6.2 Binary Analysis of WAV Audio Files
What is a WAV file?
WAV (Waveform Audio File Format) is an uncompressed audio format that includes information such as sample rate, channel count, and bit depth in its header.
Header Analysis of WAV Files
WAV files use the RIFF format, with meta information stored in the first 44 bytes.
Analyzing WAV Binary Data
import struct
with open("example.wav", "rb") as file:
header = file.read(44) # Get the first 44 bytes (WAV header)
# Parse RIFF header
riff, size, wave = struct.unpack('<4sI4s', header[:12])
# Parse format information
fmt, fmt_size, audio_format, num_channels, sample_rate = struct.unpack('<4sIHHI', header[12:24])
print(f"RIFF Header: {riff}")
print(f"Format: {wave}")
print(f"Audio Format: {audio_format}")
print(f"Channels: {num_channels}")
print(f"Sample Rate: {sample_rate} Hz")Example Output
RIFF Header: b'RIFF'
Format: b'WAVE'
Audio Format: 1
Channels: 2
Sample Rate: 44100 Hz- RIFF → Magic number indicating WAV format
- Channels: 2 → Stereo audio
- Sample Rate: 44100 Hz → CD-quality audio
6.3 Analysis of Custom Binary Formats
Some files have custom binary formats. In Python, you can analyze them using struct.
Sample Format
| Byte count | Data type | Content |
|---|---|---|
| 0–3 | I (4-byte integer) | File ID |
| 4–7 | f (4-byte float) | Version |
| 8–17 | 10s (10-byte string) | Name |
Analyzing Binary Data
import struct
with open("custom_data.bin", "rb") as file:
data = file.read()
file_id, version, name = struct.unpack('<If10s', data)
print(f"File ID: {file_id}")
print(f"Version: {version}")
print(f"Name: {name.decode().strip()}")Example Output
File ID: 12345
Version: 1.2
Name: TestFile6.4 Summary
- PNG analysis → Verify format with magic number
- WAV analysis → Check channel count and sample rate from header
- Custom format → Can extract numbers and strings with
struct.unpack()
7. Precautions and Best Practices When Handling Binary Files
When handling binary files in Python, there are several considerations such as optimizing performance, preventing data corruption, and ensuring safety. This section summarizes best practices for binary file processing.
7.1 Optimizing Processing of Large Files
Binary files can range from several hundred MB to several GB. It is safer to avoid reading the entire file and process it in chunks.
Bad Example: Reading a Large File All at Once
with open("large_file.bin", "rb") as file:
data = file.read() # Load entire file into memory (dangerous)Recommended Example: Process in Chunks
with open("large_file.bin", "rb") as file:
while chunk := file.read(4096): # Read in 4KB chunks
process(chunk)7.2 with Statement to Close Files Safely
Forgetting close() can cause resource leaks. Always use the with statement.
Bad Example
file = open("example.bin", "rb")
data = file.read()
# Forgot to close() → leakRecommended Example
with open("example.bin", "rb") as file:
data = file.read()
# File is closed automatically7.3 Correctly Specify Endianness (Byte Order)
Depending on the environment or protocol, it may be little endian or big endian. Failing to specify correctly can corrupt data.
| Endianness | Features |
|---|---|
| Little Endian | Low-order byte first (e.g., Intel CPUs) |
| Big Endian | High-order byte first (e.g., network protocols) |
Example: Differences in Endianness
import struct
binary_data = b'x01x00x00x00'
value_le = struct.unpack('<I', binary_data)[0] # Little Endian
print("Little Endian:", value_le) # 1
value_be = struct.unpack('>I', binary_data)[0] # Big Endian
print("Big Endian:", value_be) # 167772167.4 Prevent Errors with Exception Handling
File operations are prone to errors. Be sure to incorporate try-except.
Example: Safe Binary Reading
import struct
try:
with open("example.bin", "rb") as file:
binary_data = file.read(4)
value = struct.unpack('<I', binary_data)[0]
print(f"Value: {value}")
except FileNotFoundError:
print("Error: File not found.")
except struct.error:
print("Error: Invalid binary format.")
except Exception as e:
print(f"Unexpected error: {e}")7.5 Debugging Binary Data
Using binascii.hexlify() or Linux’s hexdump allows you to view binary contents.
Hexadecimal Display in Python
import binascii
with open("example.bin", "rb") as file:
binary_data = file.read(16)
print(binascii.hexlify(binary_data))Verification on Linux/macOS
hexdump -C example.bin | head7.6 Summary
- Optimize memory efficiency for large files with chunk processing
- Use the
withstatement to reliably close files - Incorrect endianness can cause data corruption
- Handle errors safely with exception handling
- Debuggable with
binascii.hexlify()andhexdump
8. Frequently Asked Questions (FAQ)
We have compiled the points that many people wonder about regarding reading, writing, and analyzing binary files in an FAQ format. We explain the common problems and solutions for handling binary data with Python.
Q1: What is the difference between a text file and a binary file?
| Item | Text file | Binary file |
|---|---|---|
| Storage format | Character encoding (UTF-8, Shift-JIS, etc.) | Byte sequence of 0s and 1s |
| Example extensions | .txt, .csv, .json | .jpg, .png, .mp3, .bin |
| Editing method | Can be edited directly with an editor | Requires specialized programs or a binary editor |
| Use cases | Source code, configuration files | Images, audio, video, executable files |
Q2: When opening a binary file in Python, what is the difference between 'rb' and 'r'?
| Mode | Description |
|---|---|
'r' | Text mode. Newline characters are automatically converted. |
'rb' | Binary mode. Newline characters are not converted and are handled as-is. |
Q3: I don’t understand how to use the struct module. How do I use it?
struct module is used to convert binary data to numbers or strings (unpack), or conversely to binary-ify (pack).
Example: Convert binary data to an integer
import struct
binary_data = b'x01x00x00x00'
value = struct.unpack('<I', binary_data)[0] # Little Endian unsigned int
print(value) # 1Q4: How to convert binary data to text?
Hexadecimal representation improves readability when converted.
import binascii
with open("example.bin", "rb") as file:
binary_data = file.read()
hex_data = binascii.hexlify(binary_data)
print(hex_data)Q5: What is endianness?
Endianness refers to the order of bytes.
| Type | Description |
|---|---|
| Little Endian | Least significant byte first (e.g., Intel CPUs) |
| Big Endian | Most significant byte first (e.g., network communication) |
Q6: How to efficiently process large binary files?
The chunked (partial block) reading method is recommended.
with open("large_file.bin", "rb") as file:
while chunk := file.read(4096): # Read in 4KB chunks
process(chunk)Q7: How to debug binary data?
Use Python’s binascii.hexlify() and Linux’s hexdump commands.
Check with Python
import binascii
with open("example.bin", "rb") as file:
binary_data = file.read(16)
print(binascii.hexlify(binary_data))Check with Linux/macOS
hexdump -C example.bin | headSummary
- Open binary with
'rb'mode - Use
structto pack/unpack for analysis or writing - Specifying the correct endianness is important
- Large files are optimized with chunk processing
- Debuggable with
binascii.hexlify()andhexdump
Conclusion
This completes the comprehensive guide to reading, writing, and analyzing binary files with Python. In the future, apply it to real projects and use it for more advanced processing 🚀



