mice-py: Multiple Imputation by Chained Equations in Python

Python 3.9+ License: MIT Documentation Status

A comprehensive Python implementation of Multiple Imputation by Chained Equations (MICE) for handling missing data in statistical analysis and machine learning workflows.

Key Features

Multiple Imputation Methods

Choose from five robust imputation strategies:

  • PMM (Predictive Mean Matching) - Maintains distributional properties

  • CART (Classification and Regression Trees) - Handles non-linear relationships

  • Random Forest - Captures complex interactions

  • MIDAS (Multiple Imputation with Distant Average Substitution) - Efficient for small samples

  • Sample - Simple random sampling from observed values

Flexible Configuration
  • Automatic predictor matrix estimation

  • Custom visit sequences for imputation order

  • Method-specific parameter control

  • Mixed data types (numeric and categorical)

Statistical Pooling
  • Rubin’s rules for combining estimates

  • Fraction of missing information (FMI)

  • Confidence intervals and standard errors

  • Formula-based model fitting with statsmodels integration

Diagnostic Tools
  • Convergence diagnostics (chain statistics)

  • Stripplots, box plots, and density plots

  • Missing data pattern visualization

  • XY plots for bivariate relationships

Quick Example

import pandas as pd
from imputation import MICE

# Load your data with missing values
df = pd.read_csv("your_data.csv")

# Initialize and run MICE
mice = MICE(df)
mice.impute(n_imputations=5, maxit=10, method='pmm')

# Fit a model and pool results
mice.fit('outcome ~ predictor1 + predictor2')
results = mice.pool(summ=True)
print(results)

Documentation Contents

Indices and Tables