MICE Class ========== The main class for performing multiple imputation by chained equations. .. currentmodule:: imputation .. autoclass:: MICE :members: :undoc-members: :show-inheritance: :special-members: __init__ Overview -------- The ``MICE`` class is the primary interface for multiple imputation in mice-py. It handles the entire imputation process, from initialization through to analysis and pooling. Basic Usage ----------- .. code-block:: python from imputation import MICE import pandas as pd # Load data with missing values df = pd.read_csv('data.csv') # Initialize MICE object mice = MICE(df) # Perform imputation mice.impute( n_imputations=5, maxit=10, method='pmm' ) # Access imputed datasets imputed_datasets = mice.imputed_datasets # Fit a statistical model mice.fit('outcome ~ predictor1 + predictor2') # Pool results pooled = mice.pool(summ=True) print(pooled) Main Methods ------------ __init__(data) ~~~~~~~~~~~~~~ Initialize a MICE object with your data. **Parameters**: - **data** (pandas.DataFrame): Input data with missing values **Raises**: - ValueError: If data is not a DataFrame or has duplicate column names impute() ~~~~~~~~ Perform multiple imputation. **Parameters**: - **n_imputations** (int): Number of imputed datasets (default: 5) - **maxit** (int): Number of iterations (default: 10) - **method** (str or dict): Imputation method(s) (default: 'pmm') - **initial** (str): Initial imputation method (default: 'sample') - **predictor_matrix** (DataFrame, optional): Custom predictor matrix - **visit_sequence** (str or list): Variable visit order (default: 'monotone') - **seed** (int, optional): Random seed for reproducibility - Additional method-specific parameters (see below) **Method-specific parameters**: - PMM: ``pmm_donors``, ``pmm_matchtype``, ``pmm_ridge`` - CART: ``cart_max_depth``, ``cart_min_samples_split``, ``cart_min_samples_leaf`` - RF: ``rf_n_estimators``, ``rf_max_depth``, ``rf_max_features`` - MIDAS: ``midas_donors``, ``midas_ridge`` **Returns**: - None (modifies object in-place) **Raises**: - ValueError: If parameters are invalid fit(formula) ~~~~~~~~~~~~ Fit a statistical model on all imputed datasets. **Parameters**: - **formula** (str): Model formula in Patsy syntax (e.g., 'y ~ x1 + x2') **Returns**: - None (stores results internally) **Example**: .. code-block:: python # Simple regression mice.fit('income ~ age + education') # With interaction mice.fit('income ~ age * education') # Multiple predictors mice.fit('outcome ~ x1 + x2 + x3 + C(categorical_var)') pool(summ=True) ~~~~~~~~~~~~~~~ Pool results from multiple imputed datasets using Rubin's rules. **Parameters**: - **summ** (bool): Return summary (True) or detailed results (False) **Returns**: - pandas.DataFrame: Pooled results with columns: - Estimate: Pooled coefficient - Std.Error: Pooled standard error - t-statistic: Test statistic - df: Degrees of freedom - P>|t|: p-value - [0.025]: Lower 95% CI bound - 0.975]: Upper 95% CI bound - FMI: Fraction of missing information **Example**: .. code-block:: python results = mice.pool(summ=True) print(results) # Access specific values coef = results.loc['age', 'Estimate'] pval = results.loc['age', 'P>|t|'] fmi = results.loc['age', 'FMI'] Attributes ---------- data ~~~~ The original input data (pandas.DataFrame). imputed_datasets ~~~~~~~~~~~~~~~~ List of imputed datasets (list of pandas.DataFrames). Available after calling ``impute()``. chain_mean ~~~~~~~~~~ Dictionary mapping variable names to mean chains across iterations. Used for convergence diagnostics. chain_var ~~~~~~~~~ Dictionary mapping variable names to variance chains across iterations. Used for convergence diagnostics. id_obs ~~~~~~ Dictionary mapping variable names to boolean arrays indicating observed values. id_mis ~~~~~~ Dictionary mapping variable names to boolean arrays indicating missing values. Examples -------- Basic Imputation ~~~~~~~~~~~~~~~~ .. code-block:: python from imputation import MICE import pandas as pd import numpy as np # Create sample data df = pd.DataFrame({ 'age': [25, 30, np.nan, 45, 50], 'income': [50000, np.nan, 60000, 75000, np.nan], 'education': ['HS', 'BS', 'MS', np.nan, 'PhD'] }) # Impute mice = MICE(df) mice.impute(n_imputations=5, maxit=10, method='pmm') # Check results print(f"Created {len(mice.imputed_datasets)} complete datasets") Custom Methods Per Variable ~~~~~~~~~~~~~~~~~~~~~~~~~~~~ .. code-block:: python method_dict = { 'age': 'pmm', 'income': 'cart', 'education': 'sample' } mice.impute(n_imputations=10, method=method_dict) Custom Predictor Matrix ~~~~~~~~~~~~~~~~~~~~~~~~ .. code-block:: python import numpy as np # Create predictor matrix pred_matrix = pd.DataFrame(1, index=df.columns, columns=df.columns) np.fill_diagonal(pred_matrix.values, 0) # Don't use education to predict income pred_matrix.loc['income', 'education'] = 0 mice.impute(predictor_matrix=pred_matrix) With Method-Specific Parameters ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ .. code-block:: python # PMM with more donors mice.impute(method='pmm', pmm_donors=10) # CART with depth limit mice.impute(method='cart', cart_max_depth=15) # Random Forest with more trees mice.impute(method='rf', rf_n_estimators=200) Complete Analysis Workflow ~~~~~~~~~~~~~~~~~~~~~~~~~~~ .. code-block:: python from imputation import MICE, configure_logging from plotting.diagnostics import plot_chain_stats # Enable logging configure_logging(level='INFO') # Load data df = pd.read_csv('data.csv') # Impute mice = MICE(df) mice.impute(n_imputations=20, maxit=20, method='pmm') # Check convergence plot_chain_stats(mice.chain_mean, mice.chain_var, save_path='convergence.png') # Fit model mice.fit('outcome ~ age + gender + treatment') # Pool results results = mice.pool(summ=True) print(results) # Check FMI print(f"\nMax FMI: {results['FMI'].max():.3f}") See Also -------- - :doc:`methods` for imputation method details - :doc:`pooling` for pooling functions - :doc:`../user_guide/mice_overview` for conceptual overview - :doc:`../examples/index` for more examples