Pooling Functions
Functions for combining results from multiple imputed datasets using Rubin’s rules.
MICEresult Class
Pooled results container for MICE following Rubin’s rules.
Separated into its own module so it can be reused and keeps MICE.py lighter.
Pooling Module
Standalone pooling module for multiple imputation results.
This module provides functions to pool descriptive statistics and model estimates from multiple imputed datasets using Rubin’s rules, without requiring coupling to any specific imputation framework.
- class imputation.pooling.PoolingResult(estimates, variances, within_variance, between_variance, frac_miss_info, param_names, n_imputations, sample_size)[source]
Bases:
objectContainer for pooled multiple imputation results.
- estimates
Pooled parameter estimates (q_bar)
- Type:
np.ndarray
- variances
Total variances for each parameter (t)
- Type:
np.ndarray
- within_variance
Average within-imputation variance (u_bar)
- Type:
np.ndarray
- between_variance
Between-imputation variance (b)
- Type:
np.ndarray
- frac_miss_info
Fraction of missing information for each parameter
- Type:
np.ndarray
- summary()[source]
Return a summary DataFrame with pooled statistics.
- Returns:
Summary table with estimates, standard errors, and diagnostics
- Return type:
pd.DataFrame
- __init__(estimates, variances, within_variance, between_variance, frac_miss_info, param_names, n_imputations, sample_size)
- imputation.pooling.validate_imputed_datasets(datasets)[source]
Validate that the input datasets are suitable for pooling.
- Parameters:
datasets (List[pd.DataFrame]) – List of imputed datasets to validate
- Raises:
ValueError – If datasets are invalid for pooling
- imputation.pooling.apply_rubins_rules(estimates, variances)[source]
Apply Rubin’s rules to combine estimates and variances across imputations.
- Parameters:
estimates (np.ndarray) – Array of shape (n_imputations, n_parameters) with parameter estimates
variances (np.ndarray) – Array of shape (n_imputations, n_parameters) with within-imputation variances
- Returns:
(pooled_estimates, total_variances, within_variance, between_variance)
- Return type:
- imputation.pooling.pool_descriptive_statistics(datasets, include_numeric=True, include_categorical=True)[source]
Pool descriptive statistics across multiple imputed datasets using Rubin’s rules.
For numeric columns, pools the sample mean and its variance. For categorical columns, pools the per-level proportions and their variances.
- Parameters:
datasets (List[pd.DataFrame]) – List of complete imputed datasets. All datasets must have the same shape and column names.
include_numeric (bool, default=True) – Whether to include numeric columns in pooling
include_categorical (bool, default=True) – Whether to include categorical columns in pooling
- Returns:
Object containing pooled estimates, variances, and diagnostic statistics
- Return type:
- Raises:
ValueError – If datasets are invalid or no columns are available for pooling
- imputation.pooling.pool_from_files(file_paths, read_kwargs=None, **pooling_kwargs)[source]
Pool descriptive statistics from datasets stored in files.
- Parameters:
- Returns:
Pooled results from the datasets
- Return type:
- imputation.pooling.pool_subset(datasets, columns=None, **pooling_kwargs)[source]
Pool descriptive statistics for a subset of columns.
- Parameters:
datasets (List[pd.DataFrame]) – List of complete imputed datasets
columns (List[str], optional) – List of column names to include in pooling. If None, uses all columns.
**pooling_kwargs – Additional arguments to pass to pool_descriptive_statistics()
- Returns:
Pooled results for the specified columns
- Return type:
See Also
Pooling Analysis for practical guidance
Rubin’s Rules for theoretical background
MICE Class for the main MICE.pool() method