How to Read Excel Files in Python with pandas and openpyxl

1 1. Introduction
2 2. Preparing to Work with Excel Files in Python
- 2.1 Python Development Environment
- 2.2 Installing Required Libraries
3 3. Reading Excel Files with openpyxl
- 3.1 Loading a Workbook
- 3.2 Selecting a Sheet and Retrieving Cell Values
4 4. Reading Excel Files with pandas
- 4.1 Converting an Excel File to a DataFrame
- 4.2 Reading a Specific Sheet or Range
5 5. Choosing between openpyxl and pandas
- 5.1 Usage examples
6 6. Practical Example: Processing and Saving Excel Data
- 6.1 Data Processing
7 7. Summary

1. Introduction

Python is one of the most popular programming languages and is widely used in data analysis and automation. In particular, when working with Excel files, you can efficiently manipulate data by using Python libraries. This article explains in detail how to read Excel files with Python, starting from the basics.

2. Preparing to Work with Excel Files in Python

To work with Excel in Python, you need dedicated libraries. This section explains how to install the required libraries and prepare your development environment.

Python Development Environment

First, make sure Python is installed. Enter the following command in a terminal or command prompt to check the Python version.

python --version

The recommended version is 3.7 or higher.

Installing Required Libraries

The two main libraries for working with Excel are the following.

openpyxl: Allows reading and writing Excel files (.xlsx).
pandas: Lets you work with data in DataFrame format and is useful for data analysis.

Installation command:

pip install openpyxl pandas

Once the installation succeeds, you’ll be able to work with Excel in Python using these libraries.

3. Reading Excel Files with openpyxl

openpyxl is a library for directly manipulating Excel files. This section covers the basic operations.

Loading a Workbook

First, let’s see how to open an Excel file.

from openpyxl import load_workbook

# Load an Excel file
workbook = load_workbook("example.xlsx")

# Check the sheet names
print(workbook.sheetnames)

The code above opens a file named example.xlsx and gets the sheet names it contains.

Selecting a Sheet and Retrieving Cell Values

This shows how to select a specific sheet and read data from it.

# Select a sheet
sheet = workbook["Sheet1"]

# Get the value of a specific cell
value = sheet["A1"].value
print(f"Value of cell A1: {value}")

Cells are specified as “column+row” (e.g., A1).

4. Reading Excel Files with pandas

pandas is a library specialized for data analysis and can easily convert Excel files into DataFrame format.

Converting an Excel File to a DataFrame

We’ll show how to read an Excel file using pandas’ read_excel function.

import pandas as pd

# Read the Excel file
df = pd.read_excel("example.xlsx")

# Display the first 5 rows of the DataFrame
print(df.head())

Running this code will display the Excel data in tabular form.

Reading a Specific Sheet or Range

By specifying a particular sheet or range, you can work with data more efficiently.

# Read a specific sheet
df = pd.read_excel("example.xlsx", sheet_name="Sheet1")

# Read only specific columns
df = pd.read_excel("example.xlsx", usecols="A:C")
print(df)

5. Choosing between openpyxl and pandas

openpyxl and pandas are each used for different purposes. This section compares their features and suggests appropriate use cases.

Library	Features	Use cases
openpyxl	Can manipulate Excel files at a fine-grained level	Cell-level editing and formatting
pandas	Specialized for data analysis and manipulation	Data aggregation and filtering

Usage examples

openpyxl: When editing while preserving an Excel file’s formatting.
pandas: When efficiently processing large volumes of data.

6. Practical Example: Processing and Saving Excel Data

In this section, we demonstrate how to process and save Excel data.

Data Processing

The following code demonstrates how to filter data and save it to a new Excel file.

import pandas as pd

# Load Excel data
df = pd.read_excel("example.xlsx")

# Filter data with specific conditions
filtered_df = df[df["Sales"] > 1000]

# Save the processed data to a new file
filtered_df.to_excel("filtered.xlsx", index=False)
print("Saved to a new file")

This code extracts only records with sales greater than 1000.

7. Summary

In this article, we explained in detail how to read Excel files using Python. You should now understand the basic operations of openpyxl and pandas and be able to use each appropriately depending on your needs. Try running the code yourself to become comfortable working with Excel in Python.