目次
1. Things to Know Before Reading XML with Python
When Do You Work with XML in Python?
Python is a versatile programming language used for many purposes, and among them, reading XML files is a common technique in data processing. In particular, the following cases require reading XML with Python.- Want to parse XML data retrieved from a Web API
- Need to process XML files exported from other systems
- Want to read XML used as configuration files
What Is XML? A Quick Review
XML (Extensible Markup Language) is a markup language that allows flexible definition of data structures. It has a structure similar to HTML, but its purpose differs. While HTML is used for visual presentation, XML is a format for describing data structure and meaning. For example, the following is a typical XML format:<book>
<title>Python Introduction</title>
<author>Taro Yamada</author>
<price>2800</price>
</book>To read and use data in this format with Python, you need to use a dedicated library.Introducing Libraries for Reading XML in Python
Python provides several ways to read XML through both standard and third‑party libraries. The most common ones are:xml.etree.ElementTree(standard library)xml.dom.minidom(standard library)lxml(third‑party library, supports XPath and validation)
Ad
2. Basics of Reading XML with Python
Reading XML Files Using ElementTree
First, let’s see how to read usingElementTree with an actual XML file. Sample XML (sample.xml):<books>
<book>
<title>Python Introduction</title>
<author>Taro Yamada</author>
<price>2800</price>
</book>
<book>
<title>The Future of AI</title>
<author>Ichiro Suzuki</author>
<price>3500</price>
</book>
</books>Python code:</pimport xml.etree.ElementTree as ET
# Load XML file
tree = ET.parse('sample.xml')
root = tree.getroot()
# Check root element tag
print(f"Root element: {root.tag}")
# Loop through each book element
for book in root.findall('book'):
title = book.find('title').text
author = book.find('author').text
price = int(book.find('price').text)
print(f"Title: {title}, Author: {author}, Price: {price} yen")Output:Root element: books
Title: Python Introduction, Author: Taro Yamada, Price: 2800 yen
Title: The Future of AI, Author: Ichiro Suzuki, Price: 3500 yenIn this way, reading an XML file with ElementTree.parse(), obtaining the root element with getroot(), and extracting the needed elements with find() or findall() is the basic workflow.How to Read XML from a String
Sometimes XML is provided as a string rather than a file. In that case, useET.fromstring() to parse it. Example:xml_data = '''
<user>
<name>Shota Sagawa</name>
<email>sagawa@example.com</email>
</user>
'''
root = ET.fromstring(xml_data)
name = root.find('name').text
email = root.find('email').text
print(f"Name: {name}, Email: {email}")Similarly, you can retrieve the required child elements from the root element using find() and extract their values.Accessing Attributes and Handling Text
XML elements may have attributes defined within the tags. In Python, you can access them using.attrib. Example (XML with attributes):<user id="101">
<name>Shota Sagawa</name>
</user>Python code:root = ET.fromstring('''
<user id="101">
<name>Shota Sagawa</name>
</user>
''')
print(f"User ID: {root.attrib['id']}")3. Introduction to Other XML Parsing Libraries
minidom: DOM-based Standard Library
xml.dom.minidom is an XML parser included in the Python standard library that works with XML according to the W3C DOM (Document Object Model) specification. It may be perceived as slightly harder to use compared to ElementTree, but it is handy when you need fine-grained control over node types and structure. Example:from xml.dom import minidom
xml_data = '''
Shota Sagawa
sagawa@example.com
'''
dom = minidom.parseString(xml_data)
name = dom.getElementsByTagName('name')[0].firstChild.nodeValue
email = dom.getElementsByTagName('email')[0].firstChild.nodeValue
print(f"Name: {name}, Email: {email}")Features and Benefits:- Easy access to detailed node structures
- Attributes and child node types are clearly categorized
- Easy to pretty-print XML output
- Code tends to be verbose
- Not suitable for processing large XML (high memory consumption)
lxml: Fast and Powerful External Library
lxml is a fast XML parser implemented in C, supporting advanced XML features such as XPath and XSLT. It offers an API similar to ElementTree, so the learning curve is relatively low. Installation:pip install lxmlBasic Usage:from lxml import etree
xml_data = '''
3000
'''
root = etree.fromstring(xml_data)
title = root.xpath('//book/title/text()')[0]
price = root.xpath('//book/price/text()')[0]
print(f"Title: {title}, Price: {price} yen")Features and Benefits:- XPath enables flexible searching
- Fast and suitable for processing large volumes of XML
- Compatible with HTML, making it useful for scraping as well
- Requires installation of an external library
- Some initial learning required (e.g., XPath)
Summary of How to Choose a Library
| Library | Features | Suitable Cases |
|---|---|---|
| ElementTree | Available in the standard library, supports basic read/write | Reading small to medium-sized XML |
| minidom | Strong at DOM manipulation, good at pretty-printing | When you need fine-grained node manipulation |
| lxml | Fast, XPath support, highly flexible | Large datasets, when advanced searching is needed |

Ad
4. Practical Sample Code
In this section, we practically introduce processing XML in Python in a way that resembles real-world business and data processing. Specifically, we show code examples that handle commonly used patterns such as “iterating over multiple nodes,” “filtering by condition,” and “writing out to an XML file.”Iterating Over Multiple Nodes
When the XML contains repeated data with the same structure (for example, multiple<book> elements), you can use findall() to loop over them. Sample XML:<books>
<book>
<title>Python Introduction</title>
<author>Taro Yamada</author>
<price>2800</price>
</book>
<book>
<title>Future of AI</title>
<author>Ichiro Suzuki</author>
<price>3500</price>
</book>
</books>Python code (using ElementTree):import xml.etree.ElementTree as ET
tree = ET.parse('books.xml')
root = tree.getroot()
for book in root.findall('book'):
title = book.find('title').text
author = book.find('author').text
price = int(book.find('price').text)
print(f"Title: {title}, Author: {author}, Price: {price} yen")By accessing individual elements inside the loop like this, you can extract data from the XML and process it.Filtering by Condition
Next is a conditional process for extracting only books priced at 3000 yen or more. Python code:for book in root.findall('book'):
price = int(book.find('price').text)
if price >= 3000:
title = book.find('title').text
print(f"Expensive book: {title} ({price} yen)")In this way, by combining if statements, you can handle only the elements that match any given condition.Writing Out to an XML File (Saving)
It is also common to modify a loaded XML and then save it as a new file. Example of writing out:# Create a new root element
root = ET.Element('users')
# Add child elements
user1 = ET.SubElement(root, 'user', attrib={'id': '1'})
ET.SubElement(user1, 'name').text = 'Shota Sagawa'
ET.SubElement(user1, 'email').text = 'sagawa@example.com'
# Save as a tree structure
tree = ET.ElementTree(root)
tree.write('users.xml', encoding='utf-8', xml_declaration=True)This generates an XML file like the following:<?xml version='1.0' encoding='utf-8'?>
<users>
<user id="1">
<name>Shota Sagawa</name>
<email>sagawa@example.com</email>
</user>
</users>Advanced: Extraction Using XPath (lxml)
If you are usinglxml, more flexible and powerful searches are possible.from lxml import etree
tree = etree.parse('books.xml')
titles = tree.xpath('//book[price >= 3000]/title/text()')
for title in titles:
print(f"Expensive book title: {title}")By leveraging XPath, you can intuitively extract data even with complex conditions.5. Common Errors and Solutions
When reading XML with Python, various errors can occur, such as syntax errors and character encoding issues. This section introduces typical errors that beginners often stumble upon and how to address them.UnicodeDecodeError: Failure to read due to character encoding differences
Error Details:UnicodeDecodeError: 'utf-8' codec can't decode byte 0x93 in position 10Cause: When an XML file is saved with a character encoding other than UTF-8 (such as Shift_JIS or UTF-16), Python cannot decode it correctly, resulting in an error. Solution: Specify the encoding explicitly when reading the file.with open('sample.xml', encoding='shift_jis') as f:
data = f.read()
import xml.etree.ElementTree as ET
root = ET.fromstring(data)Alternatively, passing the file to ElementTree.parse() in binary mode is also effective.ParseError: Invalid XML Syntax
Error Details:xml.etree.ElementTree.ParseError: not well-formed (invalid token): line 3, column 15Cause: It occurs when tags in the XML file are not closed, special characters (e.g., &) are not escaped, or there are syntax mistakes. Solution:- Identify the line and column numbers from the error message.
- Use an editor to pretty‑print the XML and check for syntax errors.
- Convert special characters (e.g.,
&→&).
<note>
<text>5 & 3</text>
</note>After correction:<note>
<text>5 & 3</text>
</note>NoneType Attribute Error: Accessing a Non‑existent Element
Error Details:AttributeError: 'NoneType' object has no attribute 'text'Cause: If the specified tag does not exist in the XML, find() returns None, and accessing .text directly causes an error. Solution: Check whether the element exists before retrieving its value.title_elem = book.find('title')
if title_elem is not None:
title = title_elem.text
else:
title = 'Unknown Title'Alternatively, if you are using Python 3.8 or later, you can write it concisely with the walrus operator (:=).if (title_elem := book.find('title')) is not None:
print(title_elem.text)XML File Corrupted or Empty
Symptoms:- No error is raised, but
getroot()returnsNone. findall()returns nothing.
- The XML file is empty (0 bytes).
- The data is truncated (e.g., download failure).
- Check the file size and contents.
- Use an XML validation tool to perform syntax checking.
- Review the process that provides or generates the download.
Ad
6. Frequently Asked Questions (FAQ)
This section provides clear Q&A-style explanations of the common questions and concerns raised by readers who want to read XML with Python. It’s written to help you resolve the points that often cause trouble in real work or learning before you encounter them.Q1. How should I handle the encoding (character set) of an XML file?
A. XML files typically include an encoding declaration at the very top, like this:<?xml version="1.0" encoding="UTF-8"?>Python’s ElementTree and lxml automatically read this declaration, but if the file is opened with a mismatched character encoding, an error will occur. Because Japanese XML files may use Shift_JIS or EUC-JP, explicitly specifying the encoding as shown below is safer:with open('sample.xml', encoding='shift_jis') as f:
data = f.read()Also, using lxml allows more flexible handling of encodings.Q2. Processing large XML files runs out of memory. What can I do?
A. If you load a large XML file all at once, it expands everything into memory, which can make processing heavy or cause errors. In such cases, using an iterator-style parser that can read incrementally is effective.import xml.etree.ElementTree as ET
for event, elem in ET.iterparse('large.xml', events=('end',)):
if elem.tag == 'book':
print(elem.find('title').text)
elem.clear() # release memoryWith this method, you can process only the needed parts in order, saving memory.Q3. What are the benefits of using XML instead of JSON?
A. While many APIs now use JSON, XML also has its own strengths.- Can define hierarchical structures rigorously (e.g., DTD/XSD)
- Allows distinction between attributes and elements
- Strongly document-oriented, suitable for configuration files and structural information
- Still the dominant format in many corporate and governmental systems
Q4. Which should I use, lxml or ElementTree?
A. You can choose based on the following criteria:| Library | Suitable Cases |
|---|---|
| ElementTree | Small to medium-sized XML, when the standard library is sufficient |
| lxml | When you want to use XPath, need high performance, or handle large data sets |
ElementTree is recommended, but when you need flexible extraction with XPath or require high processing speed, lxml is powerful.Ad
7. Summary
In this article, we explained “How to read XML with Python” in a way that is easy for beginners to understand and also practical for real-world use.Reading XML in Python is surprisingly simple
If you use the standard libraryxml.etree.ElementTree, you can read XML right away without any special setup. Once you learn the basic syntax and methods (such as parse(), find(), findall(), etc.), extracting and manipulating data becomes straightforward.Choosing the right library for your use case is important
- Small/simple tasks:
ElementTree - Fine-grained node manipulation and pretty printing:
minidom - High-performance processing and XPath queries:
lxml



