Python is a versatile programming language used for many purposes, and among them, reading XML files is a common technique in data processing. In particular, the following cases require reading XML with Python.
Want to parse XML data retrieved from a Web API
Need to process XML files exported from other systems
Want to read XML used as configuration files
Because XML’s tag structure clearly represents data hierarchy and meaning, it is used across many industries. With Python, you can easily read, transform, and analyze these XML data.
What Is XML? A Quick Review
XML (Extensible Markup Language) is a markup language that allows flexible definition of data structures. It has a structure similar to HTML, but its purpose differs. While HTML is used for visual presentation, XML is a format for describing data structure and meaning. For example, the following is a typical XML format:
To read and use data in this format with Python, you need to use a dedicated library.
Introducing Libraries for Reading XML in Python
Python provides several ways to read XML through both standard and third‑party libraries. The most common ones are:
xml.etree.ElementTree (standard library)
xml.dom.minidom (standard library)
lxml (third‑party library, supports XPath and validation)
In this article, we will clearly explain how to read XML files with Python using these libraries, complete with sample code. Rest assured, even beginners will be able to follow along as we introduce the basics step by step.
2. Basics of Reading XML with Python
Reading XML Files Using ElementTree
First, let’s see how to read using ElementTree with an actual XML file. Sample XML (sample.xml):
import xml.etree.ElementTree as ET
# Load XML file
tree = ET.parse('sample.xml')
root = tree.getroot()
# Check root element tag
print(f"Root element: {root.tag}")
# Loop through each book element
for book in root.findall('book'):
title = book.find('title').text
author = book.find('author').text
price = int(book.find('price').text)
print(f"Title: {title}, Author: {author}, Price: {price} yen")
Output:
Root element: books
Title: Python Introduction, Author: Taro Yamada, Price: 2800 yen
Title: The Future of AI, Author: Ichiro Suzuki, Price: 3500 yen
In this way, reading an XML file with ElementTree.parse(), obtaining the root element with getroot(), and extracting the needed elements with find() or findall() is the basic workflow.
How to Read XML from a String
Sometimes XML is provided as a string rather than a file. In that case, use ET.fromstring() to parse it. Example:
xml.dom.minidom is an XML parser included in the Python standard library that works with XML according to the W3C DOM (Document Object Model) specification. It may be perceived as slightly harder to use compared to ElementTree, but it is handy when you need fine-grained control over node types and structure. Example:
from xml.dom import minidom
xml_data = '''
Shota Sagawa
sagawa@example.com
'''
dom = minidom.parseString(xml_data)
name = dom.getElementsByTagName('name')[0].firstChild.nodeValue
email = dom.getElementsByTagName('email')[0].firstChild.nodeValue
print(f"Name: {name}, Email: {email}")
Features and Benefits:
Easy access to detailed node structures
Attributes and child node types are clearly categorized
Easy to pretty-print XML output
Drawbacks:
Code tends to be verbose
Not suitable for processing large XML (high memory consumption)
lxml: Fast and Powerful External Library
lxml is a fast XML parser implemented in C, supporting advanced XML features such as XPath and XSLT. It offers an API similar to ElementTree, so the learning curve is relatively low. Installation:
Fast and suitable for processing large volumes of XML
Compatible with HTML, making it useful for scraping as well
Drawbacks:
Requires installation of an external library
Some initial learning required (e.g., XPath)
Summary of How to Choose a Library
Library
Features
Suitable Cases
ElementTree
Available in the standard library, supports basic read/write
Reading small to medium-sized XML
minidom
Strong at DOM manipulation, good at pretty-printing
When you need fine-grained node manipulation
lxml
Fast, XPath support, highly flexible
Large datasets, when advanced searching is needed
4. Practical Sample Code
In this section, we practically introduce processing XML in Python in a way that resembles real-world business and data processing. Specifically, we show code examples that handle commonly used patterns such as “iterating over multiple nodes,” “filtering by condition,” and “writing out to an XML file.”
Iterating Over Multiple Nodes
When the XML contains repeated data with the same structure (for example, multiple <book> elements), you can use findall() to loop over them. Sample XML:
import xml.etree.ElementTree as ET
tree = ET.parse('books.xml')
root = tree.getroot()
for book in root.findall('book'):
title = book.find('title').text
author = book.find('author').text
price = int(book.find('price').text)
print(f"Title: {title}, Author: {author}, Price: {price} yen")
By accessing individual elements inside the loop like this, you can extract data from the XML and process it.
Filtering by Condition
Next is a conditional process for extracting only books priced at 3000 yen or more. Python code:
for book in root.findall('book'):
price = int(book.find('price').text)
if price >= 3000:
title = book.find('title').text
print(f"Expensive book: {title} ({price} yen)")
In this way, by combining if statements, you can handle only the elements that match any given condition.
Writing Out to an XML File (Saving)
It is also common to modify a loaded XML and then save it as a new file. Example of writing out:
# Create a new root element
root = ET.Element('users')
# Add child elements
user1 = ET.SubElement(root, 'user', attrib={'id': '1'})
ET.SubElement(user1, 'name').text = 'Shota Sagawa'
ET.SubElement(user1, 'email').text = 'sagawa@example.com'
# Save as a tree structure
tree = ET.ElementTree(root)
tree.write('users.xml', encoding='utf-8', xml_declaration=True)
If you are using lxml, more flexible and powerful searches are possible.
from lxml import etree
tree = etree.parse('books.xml')
titles = tree.xpath('//book[price >= 3000]/title/text()')
for title in titles:
print(f"Expensive book title: {title}")
By leveraging XPath, you can intuitively extract data even with complex conditions.
5. Common Errors and Solutions
When reading XML with Python, various errors can occur, such as syntax errors and character encoding issues. This section introduces typical errors that beginners often stumble upon and how to address them.
UnicodeDecodeError: Failure to read due to character encoding differences
Error Details:
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x93 in position 10
Cause: When an XML file is saved with a character encoding other than UTF-8 (such as Shift_JIS or UTF-16), Python cannot decode it correctly, resulting in an error. Solution: Specify the encoding explicitly when reading the file.
with open('sample.xml', encoding='shift_jis') as f:
data = f.read()
import xml.etree.ElementTree as ET
root = ET.fromstring(data)
Alternatively, passing the file to ElementTree.parse() in binary mode is also effective.
ParseError: Invalid XML Syntax
Error Details:
xml.etree.ElementTree.ParseError: not well-formed (invalid token): line 3, column 15
Cause: It occurs when tags in the XML file are not closed, special characters (e.g., &) are not escaped, or there are syntax mistakes. Solution:
Identify the line and column numbers from the error message.
Use an editor to pretty‑print the XML and check for syntax errors.
Convert special characters (e.g., & → &).
Example: Incorrect XML
<note>
<text>5 & 3</text>
</note>
After correction:
<note>
<text>5 & 3</text>
</note>
NoneType Attribute Error: Accessing a Non‑existent Element
Error Details:
AttributeError: 'NoneType' object has no attribute 'text'
Cause: If the specified tag does not exist in the XML, find() returns None, and accessing .text directly causes an error. Solution: Check whether the element exists before retrieving its value.
title_elem = book.find('title')
if title_elem is not None:
title = title_elem.text
else:
title = 'Unknown Title'
Alternatively, if you are using Python 3.8 or later, you can write it concisely with the walrus operator (:=).
if (title_elem := book.find('title')) is not None:
print(title_elem.text)
XML File Corrupted or Empty
Symptoms:
No error is raised, but getroot() returns None.
findall() returns nothing.
Cause:
The XML file is empty (0 bytes).
The data is truncated (e.g., download failure).
Solution:
Check the file size and contents.
Use an XML validation tool to perform syntax checking.
Review the process that provides or generates the download.
6. Frequently Asked Questions (FAQ)
This section provides clear Q&A-style explanations of the common questions and concerns raised by readers who want to read XML with Python. It’s written to help you resolve the points that often cause trouble in real work or learning before you encounter them.
Q1. How should I handle the encoding (character set) of an XML file?
A. XML files typically include an encoding declaration at the very top, like this:
<?xml version="1.0" encoding="UTF-8"?>
Python’s ElementTree and lxml automatically read this declaration, but if the file is opened with a mismatched character encoding, an error will occur. Because Japanese XML files may use Shift_JIS or EUC-JP, explicitly specifying the encoding as shown below is safer:
with open('sample.xml', encoding='shift_jis') as f:
data = f.read()
Also, using lxml allows more flexible handling of encodings.
Q2. Processing large XML files runs out of memory. What can I do?
A. If you load a large XML file all at once, it expands everything into memory, which can make processing heavy or cause errors. In such cases, using an iterator-style parser that can read incrementally is effective.
import xml.etree.ElementTree as ET
for event, elem in ET.iterparse('large.xml', events=('end',)):
if elem.tag == 'book':
print(elem.find('title').text)
elem.clear() # release memory
With this method, you can process only the needed parts in order, saving memory.
Q3. What are the benefits of using XML instead of JSON?
A. While many APIs now use JSON, XML also has its own strengths.
Can define hierarchical structures rigorously (e.g., DTD/XSD)
Allows distinction between attributes and elements
Strongly document-oriented, suitable for configuration files and structural information
Still the dominant format in many corporate and governmental systems
In other words, XML excels at defining structured documents rather than being primarily for human reading. Depending on the use case, it will remain relevant for the foreseeable future.
Q4. Which should I use, lxml or ElementTree?
A. You can choose based on the following criteria:
Library
Suitable Cases
ElementTree
Small to medium-sized XML, when the standard library is sufficient
lxml
When you want to use XPath, need high performance, or handle large data sets
For beginners, starting with ElementTree is recommended, but when you need flexible extraction with XPath or require high processing speed, lxml is powerful.
7. Summary
In this article, we explained “How to read XML with Python” in a way that is easy for beginners to understand and also practical for real-world use.
Reading XML in Python is surprisingly simple
If you use the standard library xml.etree.ElementTree, you can read XML right away without any special setup. Once you learn the basic syntax and methods (such as parse(), find(), findall(), etc.), extracting and manipulating data becomes straightforward.
Choosing the right library for your use case is important
Small/simple tasks:ElementTree
Fine-grained node manipulation and pretty printing:minidom
High-performance processing and XPath queries:lxml
Each has its own pros and cons, so choose based on the size and purpose of the XML you’re working with.
Be prepared for common errors and issues
Even errors that beginners often stumble on become manageable once you understand their causes and solutions. Issues such as character‑encoding mismatches, syntax errors, and element existence checks are especially common.
XML is still a relevant technology today
Although JSON usage has grown in recent years, XML remains widely used in many business systems, government agencies, and data‑exchange scenarios. Mastering XML processing with Python becomes a valuable skill across many domains.