Python NaN Detection & Handling: Missing Float Values

1. How to Detect NaN in Python

What Is NaN?

NaN (Not a Number) is a special floating‑point value that indicates a numeric operation is invalid or undefined. It typically appears as the result of division by zero or other invalid calculations, so extra care is needed during data analysis and numerical processing. If NaN isn’t handled correctly, the results can become inaccurate or the program may not behave as expected.

The Importance of Detecting NaN

If NaN appears in a dataset, it can compromise data reliability and affect calculation results. Therefore, it’s essential to first detect NaN and then handle it appropriately (e.g., removal, replacement).

2. How to Generate NaN

In Python, you can generate NaN with float('nan'). It is used to explicitly indicate an invalid result in numerical calculations.
num = float('nan')
print(num)  # Output: nan

Difference from None

NaN is numerically invalid, whereas None represents a state of “nothing”. None can be compared with ==, but NaN is not equal to itself, so using == for comparison is inappropriate.
num = float('nan')
print(num == num)  # Output: False

none_value = None
print(none_value == None)  # Output: True

3. How to Determine NaN

3.1. Determination Using the Standard Library (math.isnan())

To determine NaN with Python’s standard library, use math.isnan(). This function returns True if the given value is NaN.
import math

num = float('nan')
print(math.isnan(num))  # Result: True

3.2. Determination Using NumPy (numpy.isnan())

NumPy is a library specialized for array and matrix calculations, providing the numpy.isnan() function to efficiently determine NaNs within arrays. It is commonly used in numerical analysis and scientific data processing.
import numpy as np

num_list = [1, 2, np.nan, 4]
print(np.isnan(num_list))  # Result: [False False  True False]

3.3. Determination with pandas (pandas.isna())

When working with DataFrames, use pandas’ isna() or isnull() to detect NaNs. These functions are useful for data cleaning and handling missing values.
import pandas as pd
import numpy as np

data = pd.Series([1, 2, np.nan, 4])
print(pd.isna(data))  # Result: 0    False
                      #      1    False
                      #      2     True
                      #      3    False

4. How to Remove or Replace NaN

4.1. Removing NaN from a List

To remove NaN values in a list, you can combine math.isnan() with a list comprehension.
import math

num_list = [1, 2, float('nan'), 4]
clean_list = [num for num in num_list if not math.isnan(num)]
print(clean_list)  # Result: [1, 2, 4]

4.2. Removing NaN with pandas (dropna())

When removing NaN from a DataFrame, use the dropna() method. This removes rows or columns that contain NaN.
import pandas as pd
import numpy as np

df = pd.DataFrame({'A': [1, 2, np.nan], 'B': [4, np.nan, 6]})
clean_df = df.dropna()
print(clean_df)

4.3. Replacing NaN with pandas (fillna())

If you want to replace NaN with a specific value instead of removing it, use the fillna() method.
import pandas as pd
import numpy as np

df = pd.DataFrame({'A': [1, 2, np.nan], 'B': [4, np.nan, 6]})
df.fillna(0, inplace=True)
print(df)

5. Calculations Involving NaN

Calculations that contain NaN will also result in NaN, so to obtain accurate results you need to remove or replace NaN beforehand.
import numpy as np

result = 10 + np.nan
print(result)  # Result: nan

Example of Statistical Calculations with NaN

When performing statistical calculations on a dataset that includes NaN, if you want to ignore NaN in the computation, use NumPy’s nanmean() function. It calculates the mean while excluding NaN.
import numpy as np

data = [1, 2, np.nan, 4]
mean = np.nanmean(data)  # Calculate the mean while ignoring NaN
print(mean)  # Result: 2.3333...

6. Precautions Regarding NaN Detection

6.1. Behavior of Comparison Operators

NaN has the special property that it is not equal to any other number or even to itself when using comparison operators. Therefore, you cannot test it with == or !=; you need to use dedicated functions (isnan() or isna()).
num = float('nan')
print(num == num)  # Result: False

6.2. Points for Data Cleaning

In data analysis, leaving NaN values in place prevents accurate calculations. Because they can distort results, proper cleaning beforehand is necessary. Removing or appropriately replacing NaN values improves the reliability of the data.</h3

7. Summary

In Python, by leveraging math, numpy, and pandas, you can efficiently detect and handle NaN values. Understanding how to properly work with NaNs and acquiring the foundational knowledge to maintain the reliability of data analysis and numerical computation is a valuable skill across all programming domains.
RUNTEQ(ランテック)|超実戦型エンジニア育成スクール