Method Details ============== Brief technical overview of the five imputation methods in mice-py. PMM: Predictive Mean Matching ------------------------------ **Algorithm**: 1. Fit Bayesian linear regression on observed cases 2. Draw parameters from posterior 3. Generate predictions for observed and missing cases 4. For each missing value, find k nearest observed cases (by predicted value) 5. Randomly select one donor and use its observed value **Key feature**: Imputed values come from observed data (prevents impossible values). **Best for**: Numeric variables, preserving distributions, data with outliers. **Parameters**: ``pmm_donors`` (default: 5), ``pmm_matchtype``, ``pmm_ridge`` CART: Classification and Regression Trees ------------------------------------------ **Algorithm**: 1. Build decision tree on complete observations 2. Use tree to predict missing values 3. Add random variation to predictions **Key feature**: Automatically captures interactions and non-linear patterns. **Best for**: Non-linear relationships, interactions, categorical variables. **Parameters**: ``cart_max_depth``, ``cart_min_samples_split``, ``cart_min_samples_leaf`` Random Forest ------------- **Algorithm**: 1. Build multiple decision trees on bootstrap samples 2. Average predictions across trees 3. Add random variation to predictions **Key feature**: More stable than single tree, handles complexity well. **Best for**: Complex patterns, high-dimensional data, many interactions. **Parameters**: ``rf_n_estimators`` (default: 100), ``rf_max_depth``, ``rf_max_features`` MIDAS: Distance Aided Substitution ----------------------------------- **Algorithm**: 1. Calculate distances between cases in predictor space 2. Weight observed cases by inverse distance 3. Select k donors with highest weights 4. Use weighted average plus noise **Key feature**: Uses local structure of data, good for skewed distributions. **Best for**: Small samples, skewed distributions, when PMM struggles. **Parameters**: ``midas_donors`` (default: 5), ``midas_ridge`` Sample: Random Sampling ----------------------- **Algorithm**: 1. Pool all observed values of the variable 2. Randomly sample one value for each missing case **Key feature**: Simplest method, preserves marginal distribution exactly. **Best for**: Initial imputation, categorical variables with many levels, quick exploration. **Parameters**: None Comparison Summary ------------------ .. list-table:: :header-rows: 1 * - Method - Best For - Speed * - PMM - General purpose numeric - Fast * - CART - Non-linear, interactions - Fast * - RF - Complex patterns - Slow * - MIDAS - Skewed, small samples - Fast * - Sample - Quick/simple - Very fast See :doc:`../user_guide/imputation_methods` for practical selection guidance.