Remove Duplicates in Python – Complete Guide for Beginners

1. Introduction

In data processing with Python, duplicate elements in a list are often problematic. When duplicate data exists, processing speed can slow down and analysis results may become inaccurate. This article explains how to remove duplicates from a list using Python. It comprehensively covers methods ranging from beginner-friendly approaches to advanced techniques.

2. Basic Method for Removing Duplicate Elements

First, we’ll introduce a simple method using Python’s built‑in functions.

Using `set()` to remove duplicates

In Python, you can easily remove duplicates from a list by using the set type.

Example

original_list = [1, 2, 2, 3, 4, 4, 5]
unique_list = list(set(original_list))
print(unique_list)  # Example output: [1, 2, 3, 4, 5]

Explanation

set is a set-type data structure that does not allow duplicates. By leveraging this property, you can remove duplicates from a list.

Cautions

Using set() causes the original list’s order to be lost.
If you need to preserve order, refer to the method introduced in the next section.

3. How to Remove Duplicates While Preserving Order

If you want to remove duplicates while keeping the list order unchanged, the following methods are helpful.

Using `dict.fromkeys()`

Example

original_list = [1, 2, 2, 3, 4, 4, 5]
unique_list = list(dict.fromkeys(original_list))
print(unique_list)  # Example output: [1, 2, 3, 4, 5]

Explanation

dict.fromkeys() creates a dictionary with each element of the specified list as a key.
Since dictionary keys are unique, duplicates are removed.
From Python 3.7 onward, the order of dictionary keys is preserved.

Method using list comprehensions

Example

original_list = [1, 2, 2, 3, 4, 4, 5]
unique_list = []

[unique_list.append(item) for item in original_list if item not in unique_list] print(unique_list) # Example output: [1, 2, 3, 4, 5]

Explanation

Using a list comprehension, only unique elements are added to a new list.
This method is also effective for small lists.

Caution

This method may experience slower performance as the list grows larger.

4. Removing Duplicates from a Two-Dimensional List

In two-dimensional lists, set() and dict.fromkeys() cannot be used directly. This section explains how to remove duplicates within a two-dimensional list.

Using List Comprehensions

Example

original_list = [1, 2, 2, 3, 4, 4, 5]
unique_list = []
[unique_list.append(item) for item in original_list if item not in unique_list]
print(unique_list)  # Example output: [1, 2, 3, 4, 5]

Explanation

Use list comprehensions to eliminate duplicate items.
This method works even when the list is nested.

Caution

For large datasets, performance may degrade, so it’s necessary to choose an appropriate method.

5. Removing Duplicates Using Pandas

The Pandas library provides convenient methods for removing duplicates within a DataFrame.

Using the `drop_duplicates()` method

Example

import pandas as pd

data = {'A': [1, 2, 2, 3], 'B': [4, 5, 5, 6]}
df = pd.DataFrame(data)
df = df.drop_duplicates()
print(df)

Explanation

The drop_duplicates() method removes duplicates based on the entire DataFrame or specific columns.
By using the subset parameter, you can base the operation on particular columns.

6. Detecting and Counting Duplicate Elements

In Python, detecting duplicate elements and counting their occurrences is also very important. This section introduces methods using collections.Counter and standard Python techniques.

How to Use `collections.Counter`

Example

from collections import Counter

original_list = [1, 2, 2, 3, 4, 4, 5, 5, 5]
count = Counter(original_list)
print(count)  # Example output: Counter({5: 3, 2: 2, 4: 2, 1: 1, 3: 1})

Explanation

Counter returns each element in the list and its occurrence count as a dictionary.
You can easily identify elements with the highest occurrence counts.

How to Extract Duplicate Elements

Example

duplicates = [item for item, freq in count.items() if freq > 1]
print(duplicates)  # Example output: [2, 4, 5]

Explanation

Using Counter, add elements whose occurrence count exceeds one to a list.
With this method, you can easily list the duplicate elements.

7. Summary

The methods introduced so far are summarized below.

Advantages and applicable scenarios for each method

Method	Advantages	Considerations
Using `set()`	Simple and fast	Order is not preserved
Using `dict.fromkeys()`	Can remove duplicates while preserving order	Order is guaranteed only in Python 3.7 and later
List comprehension	Flexible and can preserve order	Processing speed decreases with large data sets
Pandas `drop_duplicates()`	Ideal for DataFrame operations	Requires Pandas installation
Using `collections.Counter`	Easily obtain occurrence counts	Performance considerations for large data sets

How to remove duplicates from a list in Python varies depending on the use case and data structure. Refer to this article to choose the appropriate method and improve your workflow efficiency.

Remove Duplicates in Python – Complete Guide for Beginners

1. Introduction

2. Basic Method for Removing Duplicate Elements

Using `set()` to remove duplicates

Example

Explanation

Cautions

3. How to Remove Duplicates While Preserving Order

Using `dict.fromkeys()`

Example

Explanation

Method using list comprehensions

Example

Explanation

Caution

4. Removing Duplicates from a Two-Dimensional List

Using List Comprehensions

Example

Explanation

Caution

5. Removing Duplicates Using Pandas

Using the `drop_duplicates()` method

Example

Explanation

6. Detecting and Counting Duplicate Elements

How to Use `collections.Counter`

Example

Explanation

How to Extract Duplicate Elements

Example

Explanation

7. Summary

Advantages and applicable scenarios for each method

Optimizing Python Memory Usage: From Basics to Advanced

Master Python for Loops and AND Operator: Basics to Advanced

Remove Duplicates in Python – Complete Guide for Beginners

1. Introduction

2. Basic Method for Removing Duplicate Elements

Using set() to remove duplicates

Example

Explanation

Cautions

3. How to Remove Duplicates While Preserving Order

Using dict.fromkeys()

Example

Explanation

Method using list comprehensions

Example

Explanation

Caution

4. Removing Duplicates from a Two-Dimensional List

Using List Comprehensions

Example

Explanation

Caution

5. Removing Duplicates Using Pandas

Using the drop_duplicates() method

Example

Explanation

6. Detecting and Counting Duplicate Elements

How to Use collections.Counter

Example

Explanation

How to Extract Duplicate Elements

Example

Explanation

7. Summary

Advantages and applicable scenarios for each method

Optimizing Python Memory Usage: From Basics to Advanced

Master Python for Loops and AND Operator: Basics to Advanced

Using `set()` to remove duplicates

Using `dict.fromkeys()`

Using the `drop_duplicates()` method

How to Use `collections.Counter`