Convergence Diagnostics ======================= After running MICE, it's crucial to check whether the algorithm has converged. This guide explains what convergence means and how to assess it. What is Convergence? -------------------- **Convergence** means the MICE algorithm has stabilized—the imputed values are no longer systematically changing from one iteration to the next. Why It Matters ~~~~~~~~~~~~~~ If MICE hasn't converged: - Imputed values may be unreliable - Statistical inferences may be biased - More iterations are needed How MICE Converges ~~~~~~~~~~~~~~~~~~ During each iteration, MICE updates the imputed values for each variable. Initially, these updates cause substantial changes, but as the algorithm proceeds, changes should become smaller and stabilize. Chain Statistics ---------------- MICE tracks two key statistics across iterations for each variable: **Chain Mean** The mean of the imputed values at each iteration **Chain Variance** The variance of the imputed values at each iteration These "chains" should: 1. Start from initial values 2. Potentially drift in early iterations 3. Stabilize after some iterations (convergence!) Visualizing Convergence ----------------------- The primary tool for checking convergence is plotting the chain statistics: .. code-block:: python from plotting.diagnostics import plot_chain_stats # After running MICE mice = MICE(df) mice.impute(n_imputations=5, maxit=20) # Plot convergence plot_chain_stats( chain_mean=mice.chain_mean, chain_var=mice.chain_var, save_path='convergence.png' ) Interpreting the Plots ~~~~~~~~~~~~~~~~~~~~~~~ **What to look for**: ✓ **Stable horizontal lines**: Means and variances have stabilized ✓ **No systematic trends**: Values aren't consistently increasing or decreasing ✓ **Mixing of chains**: If multiple imputations are shown, they should overlap ✗ **Trending lines**: Values still changing systematically ✗ **Unstable oscillations**: Large swings even in later iterations ✗ **Separated chains**: Different imputations have very different patterns Numerical Assessment -------------------- While visual inspection is primary, you can also check numerically: .. code-block:: python import numpy as np # Get chain means for a specific variable var_name = 'income' chain = mice.chain_mean[var_name] # Check if last few iterations are stable last_5 = chain[-5:] variation = np.std(last_5) / np.mean(last_5) # Coefficient of variation if variation < 0.01: # Less than 1% variation print(f"{var_name}: Converged") else: print(f"{var_name}: May need more iterations") What to Do If Not Converged ---------------------------- Increase Iterations ~~~~~~~~~~~~~~~~~~~ The simplest solution: .. code-block:: python # Try more iterations mice.impute(n_imputations=5, maxit=50) # Increased from 10 to 50 # Check again plot_chain_stats(mice.chain_mean, mice.chain_var) Most convergence issues are resolved by running more iterations. Adjust Initial Values ~~~~~~~~~~~~~~~~~~~~~ Try different initial imputation: .. code-block:: python # Use mean instead of sample for initialization mice.impute(n_imputations=5, maxit=20, initial='mean') Simplify Predictor Matrix ~~~~~~~~~~~~~~~~~~~~~~~~~~ Too many predictors or multicollinearity can slow convergence: .. code-block:: python from imputation.utils import quickpred # Use automatic selection with higher threshold predictor_matrix = quickpred(df, mincor=0.3) mice.impute(predictor_matrix=predictor_matrix, maxit=20) Change Method ~~~~~~~~~~~~~ Some methods converge faster than others: .. code-block:: python # Try a different method mice.impute(method='cart', maxit=20) # Instead of PMM How Many Iterations? -------------------- **Default**: 10 iterations Sufficient for many datasets **Recommendation**: 15-20 iterations Safer choice, check convergence diagnostics **Complex data**: 30-50+ iterations - High missingness (>30%) - Many variables - Complex relationships **Rule of thumb**: Run until chains are flat for at least 5 iterations Convergence by Variable ------------------------ Different variables may converge at different rates: .. code-block:: python # Check each variable separately for var in mice.chain_mean.keys(): chain = mice.chain_mean[var] plt.figure() plt.plot(chain) plt.title(f'Convergence: {var}') plt.xlabel('Iteration') plt.ylabel('Mean') plt.savefig(f'convergence_{var}.png') plt.close() Variables with more missingness or weaker predictive relationships typically need more iterations. Other Diagnostic Checks ----------------------- Compare Observed vs Imputed ~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Even if chains converge, check that imputed values are reasonable: .. code-block:: python from plotting.diagnostics import stripplot, densityplot missing_pattern = df.notna().astype(int) # Stripplot: visual check stripplot(mice.imputed_datasets, missing_pattern) # Density plot: distributional check densityplot(mice.imputed_datasets, missing_pattern) Look for: - Imputed values (red) within range of observed values (blue) - Similar distributions between observed and imputed - No impossible values Check Variability Between Imputations ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Multiple imputations should differ from each other: .. code-block:: python # For a specific variable var = 'income' imputed_values = [dataset[var] for dataset in mice.imputed_datasets] # Check standard deviation across imputations sd_across = np.std(imputed_values, axis=0) print(f"Mean SD across imputations: {sd_across.mean()}") If imputations are nearly identical, you may need more iterations or a less deterministic method. Common Convergence Issues -------------------------- Slow Convergence ~~~~~~~~~~~~~~~~ **Symptoms**: Chains still changing after many iterations **Causes**: - High dimensionality - Weak predictor relationships - High missingness - Multicollinearity **Solutions**: - Use quickpred to select predictors - Increase ridge parameter in PMM - Try different method (CART/RF) - More iterations Non-Convergence ~~~~~~~~~~~~~~~ **Symptoms**: Chains never stabilize, even after 50+ iterations **Causes**: - Perfect multicollinearity - Circular dependencies - Insufficient data - Model misspecification **Solutions**: - Check for perfectly correlated variables - Remove redundant predictors - Simplify predictor matrix - Consider different imputation strategy Oscillating Chains ~~~~~~~~~~~~~~~~~~ **Symptoms**: Chains oscillate rather than stabilize **Causes**: - Conflicting information from different predictors - Overfitting with complex methods **Solutions**: - Use simpler method (PMM instead of RF) - Regularize more strongly - Reduce predictor complexity Separated Chains ~~~~~~~~~~~~~~~~ **Symptoms**: Different imputation chains don't mix **Causes**: - Insufficient iterations - Bimodal or complex distributions - Categorical variables with many levels **Solutions**: - More iterations - Check if true multimodality exists - Use method appropriate for data type Best Practices -------------- 1. **Always check convergence**: Never skip this step 2. **Visual inspection first**: Plots are more informative than statistics 3. **Be conservative**: If unsure, run more iterations 4. **Check all variables**: Don't just look at your outcome variable 5. **Look at early iterations**: They can reveal problems with initialization 6. **Compare multiple runs**: Rerun with different seeds to check stability Quick Convergence Checklist ---------------------------- Before finalizing your imputation: ☐ Chain plots show stable horizontal lines for all variables ☐ No systematic trends in the last 5-10 iterations ☐ Imputed values are in reasonable range ☐ Distributions of observed and imputed values are similar ☐ Multiple imputations show appropriate variability ☐ Convergence achieved with acceptable number of iterations (<50) If all checked, your imputation is ready for analysis! Next Steps ---------- - Learn about :doc:`pooling_analysis` to analyze your imputed data - Review :doc:`best_practices` for overall guidance - See examples of complete workflows in :doc:`../examples/index`