目次
1. Introduction
Python is one of the most popular programming languages and is widely used in data analysis and automation. In particular, when working with Excel files, you can efficiently manipulate data by using Python libraries. This article explains in detail how to read Excel files with Python, starting from the basics.2. Preparing to Work with Excel Files in Python
To work with Excel in Python, you need dedicated libraries. This section explains how to install the required libraries and prepare your development environment.Python Development Environment
First, make sure Python is installed. Enter the following command in a terminal or command prompt to check the Python version.python --version
The recommended version is 3.7 or higher.Installing Required Libraries
The two main libraries for working with Excel are the following.- openpyxl: Allows reading and writing Excel files (.xlsx).
- pandas: Lets you work with data in DataFrame format and is useful for data analysis.
pip install openpyxl pandas
Once the installation succeeds, you’ll be able to work with Excel in Python using these libraries.
3. Reading Excel Files with openpyxl
openpyxl is a library for directly manipulating Excel files. This section covers the basic operations.Loading a Workbook
First, let’s see how to open an Excel file.from openpyxl import load_workbook
# Load an Excel file
workbook = load_workbook("example.xlsx")
# Check the sheet names
print(workbook.sheetnames)
The code above opens a file named example.xlsx
and gets the sheet names it contains.Selecting a Sheet and Retrieving Cell Values
This shows how to select a specific sheet and read data from it.# Select a sheet
sheet = workbook["Sheet1"]
# Get the value of a specific cell
value = sheet["A1"].value
print(f"Value of cell A1: {value}")
Cells are specified as “column+row” (e.g., A1).4. Reading Excel Files with pandas
pandas is a library specialized for data analysis and can easily convert Excel files into DataFrame format.Converting an Excel File to a DataFrame
We’ll show how to read an Excel file using pandas’read_excel
function.import pandas as pd
# Read the Excel file
df = pd.read_excel("example.xlsx")
# Display the first 5 rows of the DataFrame
print(df.head())
Running this code will display the Excel data in tabular form.Reading a Specific Sheet or Range
By specifying a particular sheet or range, you can work with data more efficiently.# Read a specific sheet
df = pd.read_excel("example.xlsx", sheet_name="Sheet1")
# Read only specific columns
df = pd.read_excel("example.xlsx", usecols="A:C")
print(df)

5. Choosing between openpyxl and pandas
openpyxl and pandas are each used for different purposes. This section compares their features and suggests appropriate use cases.Library | Features | Use cases |
---|---|---|
openpyxl | Can manipulate Excel files at a fine-grained level | Cell-level editing and formatting |
pandas | Specialized for data analysis and manipulation | Data aggregation and filtering |
Usage examples
- openpyxl: When editing while preserving an Excel file’s formatting.
- pandas: When efficiently processing large volumes of data.
6. Practical Example: Processing and Saving Excel Data
In this section, we demonstrate how to process and save Excel data.Data Processing
The following code demonstrates how to filter data and save it to a new Excel file.import pandas as pd
# Load Excel data
df = pd.read_excel("example.xlsx")
# Filter data with specific conditions
filtered_df = df[df["Sales"] > 1000]
# Save the processed data to a new file
filtered_df.to_excel("filtered.xlsx", index=False)
print("Saved to a new file")
This code extracts only records with sales greater than 1000.