Beginner's Guide: Read XML in Python with ElementTree & lxml

1 1. Things to Know Before Reading XML with Python
2 2. Basics of Reading XML with Python
3 3. Introduction to Other XML Parsing Libraries
4 4. Practical Sample Code
5 5. Common Errors and Solutions
6 6. Frequently Asked Questions (FAQ)
7 7. Summary

1. Things to Know Before Reading XML with Python

When Do You Work with XML in Python?

Python is a versatile programming language used for many purposes, and among them, reading XML files is a common technique in data processing. In particular, the following cases require reading XML with Python.

Want to parse XML data retrieved from a Web API
Need to process XML files exported from other systems
Want to read XML used as configuration files

Because XML’s tag structure clearly represents data hierarchy and meaning, it is used across many industries. With Python, you can easily read, transform, and analyze these XML data.

What Is XML? A Quick Review

XML (Extensible Markup Language) is a markup language that allows flexible definition of data structures. It has a structure similar to HTML, but its purpose differs. While HTML is used for visual presentation, XML is a format for describing data structure and meaning. For example, the following is a typical XML format:

<book>
  <title>Python Introduction</title>
  <author>Taro Yamada</author>
  <price>2800</price>
</book>

To read and use data in this format with Python, you need to use a dedicated library.

Introducing Libraries for Reading XML in Python

Python provides several ways to read XML through both standard and third‑party libraries. The most common ones are:

xml.etree.ElementTree (standard library)
xml.dom.minidom (standard library)
lxml (third‑party library, supports XPath and validation)

In this article, we will clearly explain how to read XML files with Python using these libraries, complete with sample code. Rest assured, even beginners will be able to follow along as we introduce the basics step by step.

2. Basics of Reading XML with Python

Reading XML Files Using ElementTree

First, let’s see how to read using ElementTree with an actual XML file. Sample XML (sample.xml):

<books>
  <book>
    <title>Python Introduction</title>
    <author>Taro Yamada</author>
    <price>2800</price>
  </book>
  <book>
    <title>The Future of AI</title>
    <author>Ichiro Suzuki</author>
    <price>3500</price>
  </book>
</books>

Python code:</p

import xml.etree.ElementTree as ET

# Load XML file
tree = ET.parse('sample.xml')
root = tree.getroot()

# Check root element tag
print(f"Root element: {root.tag}")

# Loop through each book element
for book in root.findall('book'):
    title = book.find('title').text
    author = book.find('author').text
    price = int(book.find('price').text)

    print(f"Title: {title}, Author: {author}, Price: {price} yen")

Output:

Root element: books
Title: Python Introduction, Author: Taro Yamada, Price: 2800 yen
Title: The Future of AI, Author: Ichiro Suzuki, Price: 3500 yen

In this way, reading an XML file with ElementTree.parse(), obtaining the root element with getroot(), and extracting the needed elements with find() or findall() is the basic workflow.

How to Read XML from a String

Sometimes XML is provided as a string rather than a file. In that case, use ET.fromstring() to parse it. Example:

xml_data = '''
<user>
  <name>Shota Sagawa</name>
  <email>sagawa@example.com</email>
</user>
'''

root = ET.fromstring(xml_data)

name = root.find('name').text
email = root.find('email').text

print(f"Name: {name}, Email: {email}")

Similarly, you can retrieve the required child elements from the root element using find() and extract their values.

Accessing Attributes and Handling Text

XML elements may have attributes defined within the tags. In Python, you can access them using .attrib. Example (XML with attributes):

<user id="101">
  <name>Shota Sagawa</name>
</user>

Python code:

root = ET.fromstring('''
<user id="101">
  <name>Shota Sagawa</name>
</user>
''')

print(f"User ID: {root.attrib['id']}")

3. Introduction to Other XML Parsing Libraries

minidom: DOM-based Standard Library

xml.dom.minidom is an XML parser included in the Python standard library that works with XML according to the W3C DOM (Document Object Model) specification. It may be perceived as slightly harder to use compared to ElementTree, but it is handy when you need fine-grained control over node types and structure. Example:

from xml.dom import minidom

xml_data = '''

  Shota Sagawa
  sagawa@example.com

'''

dom = minidom.parseString(xml_data)
name = dom.getElementsByTagName('name')[0].firstChild.nodeValue
email = dom.getElementsByTagName('email')[0].firstChild.nodeValue

print(f"Name: {name}, Email: {email}")

Features and Benefits:

Easy access to detailed node structures
Attributes and child node types are clearly categorized
Easy to pretty-print XML output

Drawbacks:

Code tends to be verbose
Not suitable for processing large XML (high memory consumption)

lxml: Fast and Powerful External Library

lxml is a fast XML parser implemented in C, supporting advanced XML features such as XPath and XSLT. It offers an API similar to ElementTree, so the learning curve is relatively low. Installation:

pip install lxml

Basic Usage:

from lxml import etree

xml_data = '''
  
    
    3000
  

'''

root = etree.fromstring(xml_data)
title = root.xpath('//book/title/text()')[0]
price = root.xpath('//book/price/text()')[0]

print(f"Title: {title}, Price: {price} yen")

Features and Benefits:

XPath enables flexible searching
Fast and suitable for processing large volumes of XML
Compatible with HTML, making it useful for scraping as well

Drawbacks:

Requires installation of an external library
Some initial learning required (e.g., XPath)

Summary of How to Choose a Library

Library	Features	Suitable Cases
ElementTree	Available in the standard library, supports basic read/write	Reading small to medium-sized XML
minidom	Strong at DOM manipulation, good at pretty-printing	When you need fine-grained node manipulation
lxml	Fast, XPath support, highly flexible	Large datasets, when advanced searching is needed

4. Practical Sample Code

In this section, we practically introduce processing XML in Python in a way that resembles real-world business and data processing. Specifically, we show code examples that handle commonly used patterns such as “iterating over multiple nodes,” “filtering by condition,” and “writing out to an XML file.”

Iterating Over Multiple Nodes

When the XML contains repeated data with the same structure (for example, multiple <book> elements), you can use findall() to loop over them. Sample XML:

<books>
  <book>
    <title>Python Introduction</title>
    <author>Taro Yamada</author>
    <price>2800</price>
  </book>
  <book>
    <title>Future of AI</title>
    <author>Ichiro Suzuki</author>
    <price>3500</price>
  </book>
</books>

Python code (using ElementTree):

import xml.etree.ElementTree as ET

tree = ET.parse('books.xml')
root = tree.getroot()

for book in root.findall('book'):
    title = book.find('title').text
    author = book.find('author').text
    price = int(book.find('price').text)

    print(f"Title: {title}, Author: {author}, Price: {price} yen")

By accessing individual elements inside the loop like this, you can extract data from the XML and process it.

Filtering by Condition

Next is a conditional process for extracting only books priced at 3000 yen or more. Python code:

for book in root.findall('book'):
    price = int(book.find('price').text)
    if price >= 3000:
        title = book.find('title').text
        print(f"Expensive book: {title} ({price} yen)")

In this way, by combining if statements, you can handle only the elements that match any given condition.

Writing Out to an XML File (Saving)

It is also common to modify a loaded XML and then save it as a new file. Example of writing out:

# Create a new root element
root = ET.Element('users')

# Add child elements
user1 = ET.SubElement(root, 'user', attrib={'id': '1'})
ET.SubElement(user1, 'name').text = 'Shota Sagawa'
ET.SubElement(user1, 'email').text = 'sagawa@example.com'

# Save as a tree structure
tree = ET.ElementTree(root)
tree.write('users.xml', encoding='utf-8', xml_declaration=True)

This generates an XML file like the following:

<?xml version='1.0' encoding='utf-8'?>
<users>
  <user id="1">
    <name>Shota Sagawa</name>
    <email>sagawa@example.com</email>
  </user>
</users>

Advanced: Extraction Using XPath (lxml)

If you are using lxml, more flexible and powerful searches are possible.

from lxml import etree

tree = etree.parse('books.xml')
titles = tree.xpath('//book[price >= 3000]/title/text()')

for title in titles:
    print(f"Expensive book title: {title}")

By leveraging XPath, you can intuitively extract data even with complex conditions.

5. Common Errors and Solutions

When reading XML with Python, various errors can occur, such as syntax errors and character encoding issues. This section introduces typical errors that beginners often stumble upon and how to address them.

UnicodeDecodeError: Failure to read due to character encoding differences

Error Details:

UnicodeDecodeError: 'utf-8' codec can't decode byte 0x93 in position 10

Cause: When an XML file is saved with a character encoding other than UTF-8 (such as Shift_JIS or UTF-16), Python cannot decode it correctly, resulting in an error. Solution: Specify the encoding explicitly when reading the file.

with open('sample.xml', encoding='shift_jis') as f:
    data = f.read()

import xml.etree.ElementTree as ET
root = ET.fromstring(data)

Alternatively, passing the file to ElementTree.parse() in binary mode is also effective.

ParseError: Invalid XML Syntax

Error Details:

xml.etree.ElementTree.ParseError: not well-formed (invalid token): line 3, column 15

Cause: It occurs when tags in the XML file are not closed, special characters (e.g., &) are not escaped, or there are syntax mistakes. Solution:

Identify the line and column numbers from the error message.
Use an editor to pretty‑print the XML and check for syntax errors.
Convert special characters (e.g., & → &).

Example: Incorrect XML

<note>
  <text>5 & 3</text>
</note>

After correction:

<note>
  <text>5 &amp; 3</text>
</note>

NoneType Attribute Error: Accessing a Non‑existent Element

Error Details:

AttributeError: 'NoneType' object has no attribute 'text'

Cause: If the specified tag does not exist in the XML, find() returns None, and accessing .text directly causes an error. Solution: Check whether the element exists before retrieving its value.

title_elem = book.find('title')
if title_elem is not None:
    title = title_elem.text
else:
    title = 'Unknown Title'

Alternatively, if you are using Python 3.8 or later, you can write it concisely with the walrus operator (:=).

if (title_elem := book.find('title')) is not None:
    print(title_elem.text)

XML File Corrupted or Empty

Symptoms:

No error is raised, but getroot() returns None.
findall() returns nothing.

Cause:

The XML file is empty (0 bytes).
The data is truncated (e.g., download failure).

Solution:

Check the file size and contents.
Use an XML validation tool to perform syntax checking.
Review the process that provides or generates the download.

6. Frequently Asked Questions (FAQ)

This section provides clear Q&A-style explanations of the common questions and concerns raised by readers who want to read XML with Python. It’s written to help you resolve the points that often cause trouble in real work or learning before you encounter them.

Q1. How should I handle the encoding (character set) of an XML file?

A. XML files typically include an encoding declaration at the very top, like this:

<?xml version="1.0" encoding="UTF-8"?>

Python’s ElementTree and lxml automatically read this declaration, but if the file is opened with a mismatched character encoding, an error will occur. Because Japanese XML files may use Shift_JIS or EUC-JP, explicitly specifying the encoding as shown below is safer:

with open('sample.xml', encoding='shift_jis') as f:
    data = f.read()

Also, using lxml allows more flexible handling of encodings.

Q2. Processing large XML files runs out of memory. What can I do?

A. If you load a large XML file all at once, it expands everything into memory, which can make processing heavy or cause errors. In such cases, using an iterator-style parser that can read incrementally is effective.

import xml.etree.ElementTree as ET

for event, elem in ET.iterparse('large.xml', events=('end',)):
    if elem.tag == 'book':
        print(elem.find('title').text)
        elem.clear()  # release memory

With this method, you can process only the needed parts in order, saving memory.

Q3. What are the benefits of using XML instead of JSON?

A. While many APIs now use JSON, XML also has its own strengths.

Can define hierarchical structures rigorously (e.g., DTD/XSD)
Allows distinction between attributes and elements
Strongly document-oriented, suitable for configuration files and structural information
Still the dominant format in many corporate and governmental systems

In other words, XML excels at defining structured documents rather than being primarily for human reading. Depending on the use case, it will remain relevant for the foreseeable future.

Q4. Which should I use, `lxml` or `ElementTree`?

A. You can choose based on the following criteria:

Library	Suitable Cases
ElementTree	Small to medium-sized XML, when the standard library is sufficient
lxml	When you want to use XPath, need high performance, or handle large data sets

For beginners, starting with ElementTree is recommended, but when you need flexible extraction with XPath or require high processing speed, lxml is powerful.

7. Summary

In this article, we explained “How to read XML with Python” in a way that is easy for beginners to understand and also practical for real-world use.

Reading XML in Python is surprisingly simple

If you use the standard library xml.etree.ElementTree, you can read XML right away without any special setup. Once you learn the basic syntax and methods (such as parse(), find(), findall(), etc.), extracting and manipulating data becomes straightforward.

Choosing the right library for your use case is important

Small/simple tasks: ElementTree
Fine-grained node manipulation and pretty printing: minidom
High-performance processing and XPath queries: lxml

Each has its own pros and cons, so choose based on the size and purpose of the XML you’re working with.

Be prepared for common errors and issues

Even errors that beginners often stumble on become manageable once you understand their causes and solutions. Issues such as character‑encoding mismatches, syntax errors, and element existence checks are especially common.

XML is still a relevant technology today

Although JSON usage has grown in recent years, XML remains widely used in many business systems, government agencies, and data‑exchange scenarios. Mastering XML processing with Python becomes a valuable skill across many domains.

Beginner’s Guide: Read XML in Python with ElementTree & lxml

1. Things to Know Before Reading XML with Python

When Do You Work with XML in Python?

What Is XML? A Quick Review

Introducing Libraries for Reading XML in Python

2. Basics of Reading XML with Python

Reading XML Files Using ElementTree

How to Read XML from a String

Accessing Attributes and Handling Text

3. Introduction to Other XML Parsing Libraries

minidom: DOM-based Standard Library

lxml: Fast and Powerful External Library

Summary of How to Choose a Library

4. Practical Sample Code

Iterating Over Multiple Nodes

Filtering by Condition

Writing Out to an XML File (Saving)

Advanced: Extraction Using XPath (lxml)

5. Common Errors and Solutions

UnicodeDecodeError: Failure to read due to character encoding differences

ParseError: Invalid XML Syntax

NoneType Attribute Error: Accessing a Non‑existent Element

XML File Corrupted or Empty

6. Frequently Asked Questions (FAQ)

Q1. How should I handle the encoding (character set) of an XML file?

Q2. Processing large XML files runs out of memory. What can I do?

Q3. What are the benefits of using XML instead of JSON?

Q4. Which should I use, `lxml` or `ElementTree`?

7. Summary

Reading XML in Python is surprisingly simple

Choosing the right library for your use case is important

Be prepared for common errors and issues

XML is still a relevant technology today

Beginner’s Guide: Safely Load YAML in Python with safe_load

Beginner’s Guide to Python Packages: pip, venv & Poetry

Beginner’s Guide: Read XML in Python with ElementTree & lxml

1. Things to Know Before Reading XML with Python

When Do You Work with XML in Python?

What Is XML? A Quick Review

Introducing Libraries for Reading XML in Python

2. Basics of Reading XML with Python

Reading XML Files Using ElementTree

How to Read XML from a String

Accessing Attributes and Handling Text

3. Introduction to Other XML Parsing Libraries

minidom: DOM-based Standard Library

lxml: Fast and Powerful External Library

Summary of How to Choose a Library

4. Practical Sample Code

Iterating Over Multiple Nodes

Filtering by Condition

Writing Out to an XML File (Saving)

Advanced: Extraction Using XPath (lxml)

5. Common Errors and Solutions

UnicodeDecodeError: Failure to read due to character encoding differences

ParseError: Invalid XML Syntax

NoneType Attribute Error: Accessing a Non‑existent Element

XML File Corrupted or Empty

6. Frequently Asked Questions (FAQ)

Q1. How should I handle the encoding (character set) of an XML file?

Q2. Processing large XML files runs out of memory. What can I do?

Q3. What are the benefits of using XML instead of JSON?

Q4. Which should I use, lxml or ElementTree?

7. Summary

Reading XML in Python is surprisingly simple

Choosing the right library for your use case is important

Be prepared for common errors and issues

XML is still a relevant technology today

Beginner’s Guide: Safely Load YAML in Python with safe_load

Beginner’s Guide to Python Packages: pip, venv & Poetry

Q4. Which should I use, `lxml` or `ElementTree`?