Understanding Missing Data
Checking for Missing Data
import pandas as pd
import numpy as np
# Check for missing values
print(df.isnull().sum())
# Missing percentages
missing_pct = df.isnull().mean() * 100
print(missing_pct)
Visualizing Missing Patterns
from plotting.utils import md_pattern_like, plot_missing_data_pattern
# Create pattern summary
pattern = md_pattern_like(df)
print(pattern)
# Visualize
plot_missing_data_pattern(pattern, save_path='missing_pattern.png')
- The pattern shows:
Which variables have missing values
Which combinations occur together
How many cases have each pattern
Types of Missing Data
- MCAR (Missing Completely at Random)
Missingness unrelated to any variables. Complete case analysis is unbiased.
- MAR (Missing at Random)
Missingness depends on observed variables. MICE assumes MAR.
- MNAR (Missing Not at Random)
Missingness depends on unobserved values. Requires specialized methods.
See Missing Data Mechanisms for technical definitions.
When to Use MICE
- Use MICE when:
Multiple variables have missing data
Data is likely MAR
You want valid statistical inference
MICE assumes MAR. Include predictors of missingness to make this more plausible.
Next Steps
MICE Overview - Learn how MICE works
Imputation Methods - Choose an imputation method