Multiple Imputation Theory
Why Multiple Imputation?
- Problem with complete case analysis:
Wastes data
Can introduce bias under MAR/MNAR
- Problem with single imputation:
Treats imputed values as if they were observed
Underestimates uncertainty
Standard errors too small, p-values too optimistic
- Multiple imputation solution:
Creates multiple plausible versions of complete data
Properly accounts for imputation uncertainty
Produces valid inference under MAR
The Three-Step Process
Imputation: Create m complete datasets with different imputed values
Analysis: Analyze each dataset separately
Pooling: Combine results using Rubin’s rules
What is MICE?
MICE (Multiple Imputation by Chained Equations) imputes one variable at a time:
Initialize: Fill missing values with simple method (mean/sample)
Iterate: For each incomplete variable:
Set its imputed values to missing
Fit model using other variables as predictors
Predict and fill missing values
Repeat for all variables
Converge: Continue iterations until chains stabilize
Repeat: Create m complete datasets
Key Points
- Assumptions:
Data is MAR
Imputation models are correctly specified
Sufficient iterations for convergence
- Advantages:
Flexible (different methods for different variables)
Handles mixed data types
No joint distribution required
- Limitations:
Assumes MAR (problems if MNAR)
Requires convergence checking
Conditional models may not have joint distribution
When MICE Works
✓ Data is MAR ✓ Clear relationships between variables ✓ Moderate missingness (<30-40%) ✓ Adequate sample size
See MICE Overview for usage details.