Rubin’s Rules
How to combine results from multiple imputed datasets.
The Problem
After imputation, you have m estimates: \(\hat{\theta}_1, ..., \hat{\theta}_m\)
Simple averaging ignores imputation uncertainty. Rubin’s rules provide the correct solution.
Basic Formulas
Pooled Estimate
Within-Imputation Variance (average sampling variance)
Between-Imputation Variance (variance due to missing data)
Total Variance
Standard Error
Confidence Interval
where \(t_{df}\) is from t-distribution with adjusted degrees of freedom.
Fraction of Missing Information (FMI)
- Interpretation:
FMI = 0: No impact from missing data
FMI = 0.3: 30% of uncertainty due to missingness
FMI > 0.3: Consider more imputations
How Many Imputations?
Old rule: m = 5 Modern recommendation: m = 20+ High missingness: m = 50-100
Rule of thumb: m ≈ percentage of incomplete cases
Usage in mice-py
mice.fit('outcome ~ predictor')
pooled = mice.pool(summ=True)
- Output includes:
Estimate: Pooled coefficientStd.Error: Pooled SEP>|t|: p-valueFMI: Fraction of missing information
See Pooling Analysis for practical usage.