Convergence Diagnostics
After running MICE, it’s crucial to check whether the algorithm has converged. This guide explains what convergence means and how to assess it.
What is Convergence?
Convergence means the MICE algorithm has stabilized—the imputed values are no longer systematically changing from one iteration to the next.
Why It Matters
- If MICE hasn’t converged:
Imputed values may be unreliable
Statistical inferences may be biased
More iterations are needed
How MICE Converges
During each iteration, MICE updates the imputed values for each variable. Initially, these updates cause substantial changes, but as the algorithm proceeds, changes should become smaller and stabilize.
Chain Statistics
MICE tracks two key statistics across iterations for each variable:
- Chain Mean
The mean of the imputed values at each iteration
- Chain Variance
The variance of the imputed values at each iteration
- These “chains” should:
Start from initial values
Potentially drift in early iterations
Stabilize after some iterations (convergence!)
Visualizing Convergence
The primary tool for checking convergence is plotting the chain statistics:
from plotting.diagnostics import plot_chain_stats
# After running MICE
mice = MICE(df)
mice.impute(n_imputations=5, maxit=20)
# Plot convergence
plot_chain_stats(
chain_mean=mice.chain_mean,
chain_var=mice.chain_var,
save_path='convergence.png'
)
Interpreting the Plots
What to look for:
✓ Stable horizontal lines: Means and variances have stabilized ✓ No systematic trends: Values aren’t consistently increasing or decreasing ✓ Mixing of chains: If multiple imputations are shown, they should overlap
✗ Trending lines: Values still changing systematically ✗ Unstable oscillations: Large swings even in later iterations ✗ Separated chains: Different imputations have very different patterns
Numerical Assessment
While visual inspection is primary, you can also check numerically:
import numpy as np
# Get chain means for a specific variable
var_name = 'income'
chain = mice.chain_mean[var_name]
# Check if last few iterations are stable
last_5 = chain[-5:]
variation = np.std(last_5) / np.mean(last_5) # Coefficient of variation
if variation < 0.01: # Less than 1% variation
print(f"{var_name}: Converged")
else:
print(f"{var_name}: May need more iterations")
What to Do If Not Converged
Increase Iterations
The simplest solution:
# Try more iterations
mice.impute(n_imputations=5, maxit=50) # Increased from 10 to 50
# Check again
plot_chain_stats(mice.chain_mean, mice.chain_var)
Most convergence issues are resolved by running more iterations.
Adjust Initial Values
Try different initial imputation:
# Use mean instead of sample for initialization
mice.impute(n_imputations=5, maxit=20, initial='mean')
Simplify Predictor Matrix
Too many predictors or multicollinearity can slow convergence:
from imputation.utils import quickpred
# Use automatic selection with higher threshold
predictor_matrix = quickpred(df, mincor=0.3)
mice.impute(predictor_matrix=predictor_matrix, maxit=20)
Change Method
Some methods converge faster than others:
# Try a different method
mice.impute(method='cart', maxit=20) # Instead of PMM
How Many Iterations?
- Default: 10 iterations
Sufficient for many datasets
- Recommendation: 15-20 iterations
Safer choice, check convergence diagnostics
- Complex data: 30-50+ iterations
High missingness (>30%)
Many variables
Complex relationships
Rule of thumb: Run until chains are flat for at least 5 iterations
Convergence by Variable
Different variables may converge at different rates:
# Check each variable separately
for var in mice.chain_mean.keys():
chain = mice.chain_mean[var]
plt.figure()
plt.plot(chain)
plt.title(f'Convergence: {var}')
plt.xlabel('Iteration')
plt.ylabel('Mean')
plt.savefig(f'convergence_{var}.png')
plt.close()
Variables with more missingness or weaker predictive relationships typically need more iterations.
Other Diagnostic Checks
Compare Observed vs Imputed
Even if chains converge, check that imputed values are reasonable:
from plotting.diagnostics import stripplot, densityplot
missing_pattern = df.notna().astype(int)
# Stripplot: visual check
stripplot(mice.imputed_datasets, missing_pattern)
# Density plot: distributional check
densityplot(mice.imputed_datasets, missing_pattern)
- Look for:
Imputed values (red) within range of observed values (blue)
Similar distributions between observed and imputed
No impossible values
Check Variability Between Imputations
Multiple imputations should differ from each other:
# For a specific variable
var = 'income'
imputed_values = [dataset[var] for dataset in mice.imputed_datasets]
# Check standard deviation across imputations
sd_across = np.std(imputed_values, axis=0)
print(f"Mean SD across imputations: {sd_across.mean()}")
If imputations are nearly identical, you may need more iterations or a less deterministic method.
Common Convergence Issues
Slow Convergence
Symptoms: Chains still changing after many iterations
- Causes:
High dimensionality
Weak predictor relationships
High missingness
Multicollinearity
- Solutions:
Use quickpred to select predictors
Increase ridge parameter in PMM
Try different method (CART/RF)
More iterations
Non-Convergence
Symptoms: Chains never stabilize, even after 50+ iterations
- Causes:
Perfect multicollinearity
Circular dependencies
Insufficient data
Model misspecification
- Solutions:
Check for perfectly correlated variables
Remove redundant predictors
Simplify predictor matrix
Consider different imputation strategy
Oscillating Chains
Symptoms: Chains oscillate rather than stabilize
- Causes:
Conflicting information from different predictors
Overfitting with complex methods
- Solutions:
Use simpler method (PMM instead of RF)
Regularize more strongly
Reduce predictor complexity
Separated Chains
Symptoms: Different imputation chains don’t mix
- Causes:
Insufficient iterations
Bimodal or complex distributions
Categorical variables with many levels
- Solutions:
More iterations
Check if true multimodality exists
Use method appropriate for data type
Best Practices
Always check convergence: Never skip this step
Visual inspection first: Plots are more informative than statistics
Be conservative: If unsure, run more iterations
Check all variables: Don’t just look at your outcome variable
Look at early iterations: They can reveal problems with initialization
Compare multiple runs: Rerun with different seeds to check stability
Quick Convergence Checklist
Before finalizing your imputation:
☐ Chain plots show stable horizontal lines for all variables ☐ No systematic trends in the last 5-10 iterations ☐ Imputed values are in reasonable range ☐ Distributions of observed and imputed values are similar ☐ Multiple imputations show appropriate variability ☐ Convergence achieved with acceptable number of iterations (<50)
If all checked, your imputation is ready for analysis!
Next Steps
Learn about Pooling Analysis to analyze your imputed data
Review Best Practices for overall guidance
See examples of complete workflows in Examples