目次
1. Introduction
In data processing with Python, duplicate elements in a list are often problematic. When duplicate data exists, processing speed can slow down and analysis results may become inaccurate. This article explains how to remove duplicates from a list using Python. It comprehensively covers methods ranging from beginner-friendly approaches to advanced techniques.2. Basic Method for Removing Duplicate Elements
First, we’ll introduce a simple method using Python’s built‑in functions.Using set()
to remove duplicates
In Python, you can easily remove duplicates from a list by using the set
type.Example
original_list = [1, 2, 2, 3, 4, 4, 5]
unique_list = list(set(original_list))
print(unique_list) # Example output: [1, 2, 3, 4, 5]
Explanation
set
is a set-type data structure that does not allow duplicates. By leveraging this property, you can remove duplicates from a list.Cautions
- Using
set()
causes the original list’s order to be lost. - If you need to preserve order, refer to the method introduced in the next section.
3. How to Remove Duplicates While Preserving Order
If you want to remove duplicates while keeping the list order unchanged, the following methods are helpful.Using dict.fromkeys()
Example
original_list = [1, 2, 2, 3, 4, 4, 5]
unique_list = list(dict.fromkeys(original_list))
print(unique_list) # Example output: [1, 2, 3, 4, 5]
Explanation
dict.fromkeys()
creates a dictionary with each element of the specified list as a key.- Since dictionary keys are unique, duplicates are removed.
- From Python 3.7 onward, the order of dictionary keys is preserved.
Method using list comprehensions
Example
original_list = [1, 2, 2, 3, 4, 4, 5]
unique_list = []
[unique_list.append(item) for item in original_list if item not in unique_list] print(unique_list) # Example output: [1, 2, 3, 4, 5]Explanation
- Using a list comprehension, only unique elements are added to a new list.
- This method is also effective for small lists.
Caution
This method may experience slower performance as the list grows larger.4. Removing Duplicates from a Two-Dimensional List
In two-dimensional lists,set()
and dict.fromkeys()
cannot be used directly. This section explains how to remove duplicates within a two-dimensional list.Using List Comprehensions
Example
original_list = [1, 2, 2, 3, 4, 4, 5]
unique_list = []
[unique_list.append(item) for item in original_list if item not in unique_list]
print(unique_list) # Example output: [1, 2, 3, 4, 5]
Explanation
- Use list comprehensions to eliminate duplicate items.
- This method works even when the list is nested.
Caution
For large datasets, performance may degrade, so it’s necessary to choose an appropriate method.5. Removing Duplicates Using Pandas
The Pandas library provides convenient methods for removing duplicates within a DataFrame.Using the drop_duplicates()
method
Example
import pandas as pd
data = {'A': [1, 2, 2, 3], 'B': [4, 5, 5, 6]}
df = pd.DataFrame(data)
df = df.drop_duplicates()
print(df)
Explanation
- The
drop_duplicates()
method removes duplicates based on the entire DataFrame or specific columns. - By using the
subset
parameter, you can base the operation on particular columns.
6. Detecting and Counting Duplicate Elements
In Python, detecting duplicate elements and counting their occurrences is also very important. This section introduces methods usingcollections.Counter
and standard Python techniques.How to Use collections.Counter
Example
from collections import Counter
original_list = [1, 2, 2, 3, 4, 4, 5, 5, 5]
count = Counter(original_list)
print(count) # Example output: Counter({5: 3, 2: 2, 4: 2, 1: 1, 3: 1})
Explanation
Counter
returns each element in the list and its occurrence count as a dictionary.- You can easily identify elements with the highest occurrence counts.
How to Extract Duplicate Elements
Example
duplicates = [item for item, freq in count.items() if freq > 1]
print(duplicates) # Example output: [2, 4, 5]
Explanation
- Using
Counter
, add elements whose occurrence count exceeds one to a list. - With this method, you can easily list the duplicate elements.

7. Summary
The methods introduced so far are summarized below.Advantages and applicable scenarios for each method
Method | Advantages | Considerations |
---|---|---|
Using set() | Simple and fast | Order is not preserved |
Using dict.fromkeys() | Can remove duplicates while preserving order | Order is guaranteed only in Python 3.7 and later |
List comprehension | Flexible and can preserve order | Processing speed decreases with large data sets |
Pandas drop_duplicates() | Ideal for DataFrame operations | Requires Pandas installation |
Using collections.Counter | Easily obtain occurrence counts | Performance considerations for large data sets |