Remove Duplicates in Python – Complete Guide for Beginners

1. Introduction

In data processing with Python, duplicate elements in a list are often problematic. When duplicate data exists, processing speed can slow down and analysis results may become inaccurate. This article explains how to remove duplicates from a list using Python. It comprehensively covers methods ranging from beginner-friendly approaches to advanced techniques.

2. Basic Method for Removing Duplicate Elements

First, we’ll introduce a simple method using Python’s built‑in functions.

Using set() to remove duplicates

In Python, you can easily remove duplicates from a list by using the set type.

Example

original_list = [1, 2, 2, 3, 4, 4, 5]
unique_list = list(set(original_list))
print(unique_list)  # Example output: [1, 2, 3, 4, 5]

Explanation

set is a set-type data structure that does not allow duplicates. By leveraging this property, you can remove duplicates from a list.

Cautions

  • Using set() causes the original list’s order to be lost.
  • If you need to preserve order, refer to the method introduced in the next section.
年収訴求

3. How to Remove Duplicates While Preserving Order

If you want to remove duplicates while keeping the list order unchanged, the following methods are helpful.

Using dict.fromkeys()

Example

original_list = [1, 2, 2, 3, 4, 4, 5]
unique_list = list(dict.fromkeys(original_list))
print(unique_list)  # Example output: [1, 2, 3, 4, 5]

Explanation

  • dict.fromkeys() creates a dictionary with each element of the specified list as a key.
  • Since dictionary keys are unique, duplicates are removed.
  • From Python 3.7 onward, the order of dictionary keys is preserved.

Method using list comprehensions

Example

original_list = [1, 2, 2, 3, 4, 4, 5]
unique_list = []
[unique_list.append(item) for item in original_list if item not in unique_list] print(unique_list) # Example output: [1, 2, 3, 4, 5]

Explanation

  • Using a list comprehension, only unique elements are added to a new list.
  • This method is also effective for small lists.

Caution

This method may experience slower performance as the list grows larger.

4. Removing Duplicates from a Two-Dimensional List

In two-dimensional lists, set() and dict.fromkeys() cannot be used directly. This section explains how to remove duplicates within a two-dimensional list.

Using List Comprehensions

Example

original_list = [1, 2, 2, 3, 4, 4, 5]
unique_list = []
[unique_list.append(item) for item in original_list if item not in unique_list]
print(unique_list)  # Example output: [1, 2, 3, 4, 5]

Explanation

  • Use list comprehensions to eliminate duplicate items.
  • This method works even when the list is nested.

Caution

For large datasets, performance may degrade, so it’s necessary to choose an appropriate method.

5. Removing Duplicates Using Pandas

The Pandas library provides convenient methods for removing duplicates within a DataFrame.

Using the drop_duplicates() method

Example

import pandas as pd

data = {'A': [1, 2, 2, 3], 'B': [4, 5, 5, 6]}
df = pd.DataFrame(data)
df = df.drop_duplicates()
print(df)

Explanation

  • The drop_duplicates() method removes duplicates based on the entire DataFrame or specific columns.
  • By using the subset parameter, you can base the operation on particular columns.

6. Detecting and Counting Duplicate Elements

In Python, detecting duplicate elements and counting their occurrences is also very important. This section introduces methods using collections.Counter and standard Python techniques.

How to Use collections.Counter

Example

from collections import Counter

original_list = [1, 2, 2, 3, 4, 4, 5, 5, 5]
count = Counter(original_list)
print(count)  # Example output: Counter({5: 3, 2: 2, 4: 2, 1: 1, 3: 1})

Explanation

  • Counter returns each element in the list and its occurrence count as a dictionary.
  • You can easily identify elements with the highest occurrence counts.

How to Extract Duplicate Elements

Example

duplicates = [item for item, freq in count.items() if freq > 1]
print(duplicates)  # Example output: [2, 4, 5]

Explanation

  • Using Counter, add elements whose occurrence count exceeds one to a list.
  • With this method, you can easily list the duplicate elements.

7. Summary

The methods introduced so far are summarized below.

Advantages and applicable scenarios for each method

MethodAdvantagesConsiderations
Using set()Simple and fastOrder is not preserved
Using dict.fromkeys()Can remove duplicates while preserving orderOrder is guaranteed only in Python 3.7 and later
List comprehensionFlexible and can preserve orderProcessing speed decreases with large data sets
Pandas drop_duplicates()Ideal for DataFrame operationsRequires Pandas installation
Using collections.CounterEasily obtain occurrence countsPerformance considerations for large data sets
How to remove duplicates from a list in Python varies depending on the use case and data structure. Refer to this article to choose the appropriate method and improve your workflow efficiency.