mice-py: Multiple Imputation by Chained Equations in Python
A comprehensive Python implementation of Multiple Imputation by Chained Equations (MICE) for handling missing data in statistical analysis and machine learning workflows.
Key Features
- Multiple Imputation Methods
Choose from five robust imputation strategies:
PMM (Predictive Mean Matching) - Maintains distributional properties
CART (Classification and Regression Trees) - Handles non-linear relationships
Random Forest - Captures complex interactions
MIDAS (Multiple Imputation with Distant Average Substitution) - Efficient for small samples
Sample - Simple random sampling from observed values
- Flexible Configuration
Automatic predictor matrix estimation
Custom visit sequences for imputation order
Method-specific parameter control
Mixed data types (numeric and categorical)
- Statistical Pooling
Rubin’s rules for combining estimates
Fraction of missing information (FMI)
Confidence intervals and standard errors
Formula-based model fitting with statsmodels integration
- Diagnostic Tools
Convergence diagnostics (chain statistics)
Stripplots, box plots, and density plots
Missing data pattern visualization
XY plots for bivariate relationships
Quick Example
import pandas as pd
from imputation import MICE
# Load your data with missing values
df = pd.read_csv("your_data.csv")
# Initialize and run MICE
mice = MICE(df)
mice.impute(n_imputations=5, maxit=10, method='pmm')
# Fit a model and pool results
mice.fit('outcome ~ predictor1 + predictor2')
results = mice.pool(summ=True)
print(results)
Documentation Contents
Getting Started
User Guide
Theory & Background
Examples
API Reference
Development