Imputations

This package provides robust imputation methods including MICE, PMM, CART, and Random Forest for handling missing data in statistical analysis.

Main Class

MICE (Multiple Imputation by Chained Equations)

class imputation.MICE.MICE(data)[source]

Bases: object

Multiple Imputation by Chained Equations (MICE) class.

This class implements the MICE algorithm for handling missing data through multiple imputations using chained equations.

Parameters:

data (pd.DataFrame) – Input data with missing values. Must be a pandas DataFrame.

data

The validated and cleaned input data

Type:

pd.DataFrame

id_obs

Dictionary mapping column names to indices of observed values

Type:

Dict[str, np.ndarray]

id_mis

Dictionary mapping column names to indices of missing values

Type:

Dict[str, np.ndarray]

__init__(data)[source]

Initialize the MICE object.

Parameters:

data (pd.DataFrame) – Input data with missing values. Must be a pandas DataFrame.

Raises:

ValueError – If data is not a pandas DataFrame or contains duplicate column names

impute(n_imputations=5, maxit=10, predictor_matrix=None, initial='sample', method=None, visit_sequence='monotone', **kwargs)[source]

Perform multiple imputation by chained equations.

Parameters:
  • n_imputations (int, default=5) – Number of imputations to perform

  • maxit (int, default=10) – Maximum number of iterations for each imputation cycle. Must be a positive integer.

  • predictor_matrix (pd.DataFrame, optional) – Binary matrix indicating which variables should be used as predictors for each target variable. Should have column names as both index and columns. A 1 indicates that the column variable is used as predictor for the index variable. If None, a predictor matrix is estimated using _quickpred.

  • initial (str, default=DEFAULT_INITIAL_METHOD) – Initial imputation method. Must be one of SUPPORTED_INITIAL_METHODS.

  • method (Union[str, Dict[str, str]], optional) – Imputation method(s) to use: - str: use the same method for all columns - Dict[str, str]: dictionary mapping column names to their methods - None: use default method for all columns Must be one of SUPPORTED_METHODS.

  • visit_sequence (Union[str, List[str]], default="monotone") – Sequence in which variables should be visited during imputation: - str: “monotone” for monotone missing data pattern - List[str]: list of column names specifying the order to visit variables

  • **kwargs (dict) –

    Additional keyword arguments. - output_dir (str, optional): Directory to save outputs for this run.

    If not provided, a timestamped folder is created in output_figures.

    Parameters for specific imputation methods can also be passed. These should be prefixed with the method name and an underscore, e.g., pmm_donors=5 to pass donors=5 to the pmm imputer.

    When predictor_matrix is not specified, the following can be passed for _quickpred: - min_cor (float, default=0.1): Minimum correlation for a predictor. - min_puc (float, default=0.0): Minimum proportion of usable cases. - include (list, optional): Columns to always include as predictors. - exclude (list, optional): Columns to always exclude as predictors. - correlation_method (str, default=”pearson”): Correlation method used to

    compute the correlation matrix inside _quickpred.

fit(formula)[source]

Fit a statistical model to each imputed dataset using the specified formula.

This method fits the specified statistical model to each dataset in self.imputed_datasets and stores the results in self.model_results.

Parameters:

formula (str) – A formula string in patsy syntax for statsmodels (e.g., ‘y ~ x1 + x2’)

Raises:

ValueError – If no imputed datasets are available or if variables in formula are not in data

Examples

>>> mice_obj = MICE(data)
>>> mice_obj.impute(n_imputations=5)
>>> mice_obj.fit('outcome ~ predictor1 + predictor2')
pool(summ=False)[source]

Pool parameter estimates from fitted models using Rubin’s rules.

This method combines parameter estimates and their uncertainties from multiple imputed datasets according to Rubin’s (1987) rules for multiple imputation inference.

Parameters:

summ (bool, default=False) – If True, returns a summary of the pooled results

Returns:

If summ=False, returns a MICEresult object containing pooled estimates. If summ=True, returns a summary table of the pooled results.

Return type:

MICEresult or summary

Raises:

ValueError – If no model results are available from analysis

Notes

Rubin’s pooling rules combine: - Point estimates: average across imputations - Within-imputation variance: average of individual model variances - Between-imputation variance: variance of point estimates across imputations - Total variance: within + (1 + 1/m) * between - Fraction of missing information (FMI): proportion of uncertainty due to missingness

References

Rubin, D.B. (1987). Multiple Imputation for Nonresponse in Surveys. New York: John Wiley and Sons.

Imputation Methods

CART (Classification and Regression Trees)

imputation.cart.cart(y, id_obs, x, id_mis=None, min_samples_leaf=5, ccp_alpha=0.0001, rng=None, **kwargs)[source]

Impute missing values using Classification and Regression Trees (CART).

This function is designed to be compatible with the MICE framework.

Parameters:
  • y (Union[pd.Series, np.ndarray]) – Target variable with missing values

  • id_obs (np.ndarray) – Boolean mask of observed values in y (True for observed, False for missing)

  • x (Union[pd.DataFrame, np.ndarray]) – Predictor variables (must be fully observed)

  • id_mis (np.ndarray, optional) – Boolean mask of missing values to impute. If None, uses ~id_obs

  • min_samples_leaf (int, default=5) – Minimum number of samples required to be at a leaf node

  • ccp_alpha (float, default=1e-4) – Complexity parameter for pruning

  • rng (np.random.Generator, optional) – Random number generator for reproducibility. If None, a fresh generator is used.

  • **kwargs (dict) – Additional parameters passed to the tree model

Returns:

Imputed values for missing positions only (matching R implementation).

Return type:

np.ndarray

Notes

The procedure follows R’s mice CART implementation: 1. Bootstrap the observed cases (sample with replacement) 2. Fit a classification or regression tree on the bootstrap sample 3. For each missing value, find the terminal node it would end up in 4. Make a random draw from the ORIGINAL observed values in that node

This adds stochasticity through both bootstrapping and donor sampling.

Random Forest

imputation.rf.rf(y, id_obs, x, id_mis=None, n_estimators=10, rng=None, **kwargs)[source]

Impute missing values using Random Forests with donor sampling.

This function is designed to be compatible with the MICE framework, following the same interface as PMM, midas, CART, and sample methods.

Parameters:
  • y (Union[pd.Series, np.ndarray]) – Target variable with missing values

  • id_obs (np.ndarray) – Boolean mask of observed values in y (True = observed, False = missing)

  • x (Union[pd.DataFrame, np.ndarray]) – Predictor variables (should be the current completed columns)

  • id_mis (np.ndarray, optional) – Boolean mask of missing values. If None, uses ~id_obs.

  • n_estimators (int, default=10) – Number of trees in the forest

  • rng (np.random.Generator, optional) – Random number generator for reproducibility. If None, a fresh generator is used.

  • **kwargs (dict) – Additional parameters passed to the random forest model.

Returns:

Imputed values for missing positions only.

Return type:

np.ndarray

Notes

Algorithm (Doove et al., 2014; mirrors R mice): 1. Fit a random forest on observed data. 2. For each missing case, find terminal nodes across all trees. 3. For each tree, collect donors (observed cases in same node). 4. Randomly sample one donor per tree. 5. Take final imputation as a random draw from those donor predictions.

Bootstrapping is inherent to Random Forest (bagging), so no additional bootstrap is applied (matching R mice behavior). Each tree is already built on a bootstrap sample of the data.

Sample Method

imputation.sample.sample(y, id_obs, x, id_mis=None, rng=None, **kwargs)[source]

Impute missing values by random sampling from observed values.

This function is designed to be compatible with the MICE framework, following the same interface as PMM, midas, and CART imputation methods.

Parameters:
  • y (Union[pd.Series, np.ndarray]) – Target variable with missing values

  • id_obs (np.ndarray) – Boolean mask of observed values in y (True for observed, False for missing)

  • x (Union[pd.DataFrame, np.ndarray]) – Predictor variables (not used in this method, but kept for consistency)

  • id_mis (np.ndarray, optional) – Boolean mask of missing values to impute. If None, uses ~id_obs

  • rng (np.random.Generator, optional) – Random number generator for reproducibility. If None, a fresh generator is used.

  • **kwargs (dict) – Additional arguments (not used in this method)

Returns:

Imputed values for missing positions only (matching R implementation).

Return type:

np.ndarray

Notes

This is the simplest imputation method that: 1. Takes all observed values in the target variable 2. Randomly samples from them to fill in missing values 3. No modeling is involved, just random sampling with replacement

This method ignores the predictor variables (x) and only uses the observed values of the target variable for imputation.

Edge cases handled (matching R implementation): - If no observed values: returns random normal values for numeric data,

None values for categorical data

  • If only one observed value: duplicates it to allow sampling

PMM (Predictive Mean Matching)

imputation.PMM.pmm(y, id_obs, x, id_mis=None, donors=5, matchtype=1, quantify=True, ridge=1e-05, matcher='NN', rng=None, **kwargs)[source]

Predictive Mean Matching (PMM) imputation.

This function imputes missing values in a variable y using predictive mean matching. The method is based on Rubin’s (1987) Bayesian linear regression and mimics the behavior of the R mice package’s PMM imputation method.

Parameters:
  • y (array-like (1D), shape (n_samples,)) – Target variable to be imputed. Can be numeric or categorical.

  • id_obs (array-like of bool, shape (n_samples,)) – Logical array indicating which elements of y are observed (True) or missing (False).

  • x (array-like (2D), shape (n_samples, n_features)) – Numeric design matrix of predictors. Must have no missing values.

  • id_mis (array-like of bool, shape (n_samples,), optional) – Logical array indicating which values should be imputed. If None, id_mis is set to the complement of id_obs.

  • donors (int, default=5) – Number of donors to draw from the observed cases when imputing missing values.

  • matchtype (int, default=1) – Type of matching: - 0: Predicted value of y_obs vs predicted value of y_mis - 1: Predicted value of y_obs vs drawn value of y_mis (default) - 2: Drawn value of y_obs vs drawn value of y_mis

  • quantify (bool, default=True) – If True and y is categorical, factor levels are replaced by the first canonical variate (via CCA). If False, categorical values are replaced by integer codes (less accurate).

  • ridge (float, default=1e-5) – Ridge regularization parameter used in norm_draw() to stabilize estimation. Increase for multicollinear data, decrease to reduce bias.

  • matcher (str, default="NN") – Matching method. Currently only “NN” (nearest neighbor) is supported.

  • **kwargs (dict) – Additional arguments passed to norm_draw(), such as ls_meth.

Returns:

y_imp – Imputed values for missing positions only (matching R implementation). Returns object array if y was categorical, else float array.

Return type:

np.ndarray

Notes

Based on: - Rubin, D. B. (1987). Multiple Imputation for Nonresponse in Surveys. - Van Buuren, S. & Groothuis-Oudshoorn, K. (2011). mice R package.

Examples

>>> y = np.array([7, np.nan, 9, 10, 11])
>>> id_obs = ~np.isnan(y)
>>> x = np.array([[1, 2], [3, 4], [5, 7], [7, 8], [9, 10]])
>>> pmm(y=y, id_obs=id_obs, x=x, donors=3)
imputation.PMM.quantify_cca(y, id_obs, x)[source]

Factorize a categorical variable y into numeric values via optimal scaling using Canonical Correlation Analysis (CCA) with predictors x.

Parameters:
  • y (array-like, categorical variable with missing values)

  • id_obs (boolean array-like, mask indicating observed (True) and missing (False) in y)

  • x (array-like or DataFrame, predictors without missing values corresponding to y)

Returns:

  • ynum (numpy.ndarray) – Numeric transformation of y with missing positions as np.nan.

  • id (pandas.DataFrame) – DataFrame representing the canonical components for the observed y.

Notes

This method encodes y as one-hot vectors, then applies CCA to find numeric representations that maximize correlation with predictors x.

imputation.PMM.matcherid(d, t, matcher='NN', k=10, radius=3, rng=None)[source]

Find donor indices matching missing values based on specified matching method.

Parameters:
  • d (np.array) – Numeric vector of observed values (donor pool).

  • t (np.array) – Numeric vector of missing values to be matched.

  • matcher (str, optional) – Matching method to use: - “NN”: Randomly selects one from the k nearest neighbors (default). - “fixedNN”: Randomly selects one donor within a fixed radius.

  • k (int, optional) – Number of nearest neighbors to consider (only for “NN” matcher).

  • radius (float, optional) – Radius threshold for fixedNN matcher (only for “fixedNN” matcher).

  • rng (np.random.Generator, optional) – Random number generator for reproducibility. If None, a fresh generator is used.

Returns:

List of indices corresponding to chosen donors in d for each element in t.

Return type:

list of int

Raises:

ValueError – If an unknown matcher method is specified.

Examples

>>> d = np.array([-5, 6, 0, 10, 12])
>>> t = np.array([-6])
>>> matcherid(d, t, matcher="NN", k=3)
[0]
>>> matcherid(d, t, matcher="fixedNN", radius=5)
[0]

MIDAS (Multiple Imputation by Distance Aided Donor Selection)

imputation.midas.bootfunc_plain(n)[source]

Generates bootstrap weights for n observations using simple random sampling with replacement.

This function simulates a nonparametric bootstrap by randomly drawing n integers from the range 1 to n (inclusive), with replacement. It returns the count of how many times each index (1-based) is selected, producing a frequency table that can be used as weights in e.g. MIDAS imputation.

Parameters:

n (int) – The number of observations to sample and also the length of the resulting weight vector.

Returns:

weights – An array of integers indicating how often each index (1-based) was selected in the bootstrap sample.

Return type:

ndarray of shape (n,)

imputation.midas.minmax(x, domin=True, domax=True)[source]
imputation.midas.compute_beta(x, m)[source]
imputation.midas.midas(y, id_obs, x, id_mis=None, ridge=1e-05, midas_kappa=None, outout=True, **kwargs)[source]

MIDAS Imputation: Multiple Imputation with Distant Average Substitution.

This function implements the MIDAS imputation algorithm for continuous variables, as introduced by Gaffert et al. (2018).

It operates by weighting observed donors based on the similarity between predicted values, with optional leave-one-out model estimation for increased fidelity.

Parameters:
  • y (array-like of shape (n_samples,)) – The target variable with missing values to be imputed. Must be numeric.

  • id_obs (array-like of bool of shape (n_samples,)) – Logical array indicating observed values in y. True where y is observed, False where missing.

  • x (array-like of shape (n_samples, n_features)) – Design matrix of predictor variables. Must be fully observed.

  • id_mis (np.ndarray, optional) – Boolean mask of missing values to impute. If None, uses ~id_obs.

  • ridge (float, default=1e-5) – Ridge penalty used in regularized regression to stabilize the solution in the presence of multicollinearity. - Set lower (e.g. 1e-6) to reduce bias in noisy data. - Set higher (e.g. 1e-4) if collinearity is suspected.

  • midas_kappa (float or None, default=None) – Controls the sharpness of donor weighting. If None, the optimal value is estimated based on R² as described by Siddique and Belin (2008). A common fallback is 3.

  • outout (bool, default=True) – If True, uses leave-one-out regression for each donor (slow but MI-proper). If False, a single model is estimated for all donors and recipients. WARNING: Setting outout=False may produce biased estimates and is not fully supported.

  • **kwargs (dict) – Additional arguments (not used in this method).

Returns:

y_imp – Imputed values for missing positions only (matching R implementation).

Return type:

np.ndarray

Notes

  • Based on: Gaffert, P., Meinfelder, F., & van den Bosch, V. (2018). “Towards an MI-proper Predictive Mean Matching.”

  • Related: Siddique, J. & Belin, T. R. (2008). “Multiple Imputation Using an Iterative Hot-Deck with Distance-Based Donor Selection.”

Examples

>>> y = np.array([7, np.nan, 9, 10, 11])
>>> id_obs = ~np.isnan(y)
>>> x = np.array([[1, 2], [3, 4], [5, 6], [7, 13], [11, 10]])
>>> midas(y, id_obs, x)
array([9.0])

Utilities and Support

Result Classes

Pooled results container for MICE following Rubin’s rules.

Separated into its own module so it can be reused and keeps MICE.py lighter.

class imputation.mice_result.MICEresult(model, params, normalized_cov_params)[source]

Bases: LikelihoodModelResults

Holds pooled parameter estimates after multiple imputations.

__init__(model, params, normalized_cov_params)[source]
summary(title=None, alpha=0.05)[source]

Return a statsmodels summary object with an FMI column.

Pooling Functions

Standalone pooling module for multiple imputation results.

This module provides functions to pool descriptive statistics and model estimates from multiple imputed datasets using Rubin’s rules, without requiring coupling to any specific imputation framework.

class imputation.pooling.PoolingResult(estimates, variances, within_variance, between_variance, frac_miss_info, param_names, n_imputations, sample_size)[source]

Bases: object

Container for pooled multiple imputation results.

estimates

Pooled parameter estimates (q_bar)

Type:

np.ndarray

variances

Total variances for each parameter (t)

Type:

np.ndarray

within_variance

Average within-imputation variance (u_bar)

Type:

np.ndarray

between_variance

Between-imputation variance (b)

Type:

np.ndarray

frac_miss_info

Fraction of missing information for each parameter

Type:

np.ndarray

param_names

Names of the pooled parameters

Type:

List[str]

n_imputations

Number of imputations used

Type:

int

sample_size

Sample size of each imputed dataset

Type:

int

estimates: ndarray
variances: ndarray
within_variance: ndarray
between_variance: ndarray
frac_miss_info: ndarray
param_names: List[str]
n_imputations: int
sample_size: int
summary()[source]

Return a summary DataFrame with pooled statistics.

Returns:

Summary table with estimates, standard errors, and diagnostics

Return type:

pd.DataFrame

__init__(estimates, variances, within_variance, between_variance, frac_miss_info, param_names, n_imputations, sample_size)
imputation.pooling.validate_imputed_datasets(datasets)[source]

Validate that the input datasets are suitable for pooling.

Parameters:

datasets (List[pd.DataFrame]) – List of imputed datasets to validate

Raises:

ValueError – If datasets are invalid for pooling

imputation.pooling.apply_rubins_rules(estimates, variances)[source]

Apply Rubin’s rules to combine estimates and variances across imputations.

Parameters:
  • estimates (np.ndarray) – Array of shape (n_imputations, n_parameters) with parameter estimates

  • variances (np.ndarray) – Array of shape (n_imputations, n_parameters) with within-imputation variances

Returns:

(pooled_estimates, total_variances, within_variance, between_variance)

Return type:

tuple

imputation.pooling.pool_descriptive_statistics(datasets, include_numeric=True, include_categorical=True)[source]

Pool descriptive statistics across multiple imputed datasets using Rubin’s rules.

For numeric columns, pools the sample mean and its variance. For categorical columns, pools the per-level proportions and their variances.

Parameters:
  • datasets (List[pd.DataFrame]) – List of complete imputed datasets. All datasets must have the same shape and column names.

  • include_numeric (bool, default=True) – Whether to include numeric columns in pooling

  • include_categorical (bool, default=True) – Whether to include categorical columns in pooling

Returns:

Object containing pooled estimates, variances, and diagnostic statistics

Return type:

PoolingResult

Raises:

ValueError – If datasets are invalid or no columns are available for pooling

imputation.pooling.pool_from_files(file_paths, read_kwargs=None, **pooling_kwargs)[source]

Pool descriptive statistics from datasets stored in files.

Parameters:
  • file_paths (List[str]) – List of file paths to imputed datasets

  • read_kwargs (dict, optional) – Keyword arguments to pass to pd.read_csv()

  • **pooling_kwargs – Additional arguments to pass to pool_descriptive_statistics()

Returns:

Pooled results from the datasets

Return type:

PoolingResult

imputation.pooling.pool_subset(datasets, columns=None, **pooling_kwargs)[source]

Pool descriptive statistics for a subset of columns.

Parameters:
  • datasets (List[pd.DataFrame]) – List of complete imputed datasets

  • columns (List[str], optional) – List of column names to include in pooling. If None, uses all columns.

  • **pooling_kwargs – Additional arguments to pass to pool_descriptive_statistics()

Returns:

Pooled results for the specified columns

Return type:

PoolingResult

Validation Functions

imputation.validators.check_n_imputations(n_imputations)[source]

Check if the number of imputations is valid and provide a warning if it’s high.

Parameters:

n_imputations (int) – Number of imputations to perform

Raises:

ValueError – If n_imputations is not a positive integer

imputation.validators.check_maxit(maxit)[source]

Check if the maximum iterations parameter is valid and provide a warning if it’s high.

Parameters:

maxit (int) – Maximum number of iterations for each imputation cycle

Raises:

ValueError – If maxit is not a positive integer

imputation.validators.check_method(method, columns)[source]

Check and process the method parameter for MICE imputation.

Parameters:
  • method (Union[str, Dict[str, str]]) – Method specification. Can be: - str: use the same method for all columns - Dict[str, str]: dictionary mapping column names to their methods

  • columns (List[str]) – List of column names in the data

Returns:

Dictionary mapping each column to its imputation method

Return type:

Dict[str, str]

Raises:

ValueError – If method is invalid or references non-existent columns

imputation.validators.check_initial_method(initial_method)[source]

Check if the initial imputation method is valid.

Parameters:

initial_method (str) – Initial imputation method to validate

Raises:

ValueError – If initial_method is not a valid initial imputation method

imputation.validators.check_visit_sequence(visit_sequence, columns, columns_with_missing=None)[source]

Check and process the visit sequence parameter for MICE imputation.

Parameters:
  • visit_sequence (Union[str, List[str]]) – Visit sequence specification. Can be: - str: “monotone” or “random” for predefined sequences - List[str]: list of column names specifying the order to visit variables

  • columns (List[str]) – List of all column names in the data

  • columns_with_missing (List[str], optional) – List of columns that have missing values. If provided, will validate that all these columns are included in a custom visit sequence.

Returns:

(validated_sequence, columns_without_missing) where: - validated_sequence: the processed visit sequence (only for list input, None for string) - columns_without_missing: list of columns in sequence that don’t have missing values

Return type:

tuple

Raises:

ValueError – If visit_sequence is invalid or references non-existent columns

Notes

For string visit sequences (“monotone”, “random”), the actual sequence will be generated in MICE._set_visit_sequence() based on the data.

For list visit sequences, this function validates that: 1. All columns exist in the data 2. No duplicate columns 3. All columns with missing values are included (if columns_with_missing provided)

imputation.validators.validate_predictor_matrix(predictor_matrix, data_columns, data)[source]

Validate predictor matrix for MICE imputation.

Parameters:
  • predictor_matrix (pd.DataFrame) – Binary matrix indicating which variables should be used as predictors for each target variable. Rows represent target variables, columns represent predictors. A 1 indicates that the column variable is used as predictor for the index variable.

  • data_columns (List[str]) – List of column names in the data to validate against

  • data (pd.DataFrame) – The data to check for missing values

Returns:

Validated predictor matrix

Return type:

pd.DataFrame

Raises:

ValueError – If predictor_matrix has invalid structure or column names don’t match data

imputation.validators.validate_columns(data)[source]

Validate and clean columns in the DataFrame.

Checks for columns with only NaN values and drops them with appropriate warnings.

Parameters:

data (pd.DataFrame) – Input DataFrame to validate

Returns:

DataFrame with invalid columns removed

Return type:

pd.DataFrame

Warns:

UserWarning – If columns with only NaN values are found and dropped

Notes

Missing data values that are treated as NaN: - pandas NaN (numpy.nan)

imputation.validators.validate_dataframe(data)[source]

Check and validate input data for MICE imputation.

Parameters:

data (Any) – Input data to be checked and converted to DataFrame

Returns:

Validated and cleaned DataFrame

Return type:

pd.DataFrame

Raises:

ValueError – If data cannot be converted to DataFrame or has duplicate column names

Notes

Missing data values that are treated as NaN: - pandas NaN (numpy.nan)

imputation.validators.validate_formula(formula, columns)[source]

Validate that all variables in the formula exist in the dataset columns.

Parameters:
  • formula (str) – The formula string to validate

  • columns (List[str]) – List of column names in the dataset

Raises:

ValueError – If any variables in the formula are not found in the columns

Configuration

Logging Configuration

Logging configuration module for the imputation package.

This module provides proper logging setup following Python best practices, allowing users to configure logging behavior without affecting the global logging state.

imputation.logging_config.setup_logging(level='INFO', log_dir=None, console=True, file_logging=True, format_string=None, console_level=None, file_level=None, max_bytes=5242880, backup_count=5)[source]

Configure logging for the imputation package.

This function sets up a package-specific logger without affecting the root logger or other packages. It’s safe to call multiple times.

Parameters:
  • level (str or int, default="INFO") – Base logging level for the package (‘DEBUG’, ‘INFO’, ‘WARNING’, ‘ERROR’) or logging level constant (e.g., logging.DEBUG)

  • log_dir (str, optional) – Directory for log files. If None, uses ‘./logs’

  • console (bool, default=True) – Whether to enable console logging

  • file_logging (bool, default=True) – Whether to enable file logging

  • format_string (str, optional) – Custom format string for log messages. If None, uses default format

  • console_level (str or int, optional) – Logging level for console handler. If None, uses ‘INFO’

  • file_level (str or int, optional) – Logging level for file handler. If None, uses ‘DEBUG’

  • max_bytes (int, default=5MB) – Maximum size of log file before rotation

  • backup_count (int, default=5) – Number of backup log files to keep

Returns:

The configured package logger

Return type:

logging.Logger

Examples

Basic usage: >>> import imputation >>> logger = imputation.setup_logging()

Custom configuration: >>> logger = imputation.setup_logging( … level=’DEBUG’, … log_dir=’my_logs’, … console_level=’WARNING’ … )

Disable file logging: >>> logger = imputation.setup_logging(file_logging=False)

imputation.logging_config.get_logger(name)[source]

Get a logger for a specific module within the imputation package.

This function returns a child logger of the main package logger, ensuring proper hierarchy and inheritance of configuration.

Parameters:

name (str) – Module name, typically __name__ or a descriptive string

Returns:

A logger instance for the specified module

Return type:

logging.Logger

Examples

In a module file: >>> from imputation.logging_config import get_logger >>> logger = get_logger(__name__) >>> logger.info(“This is a log message”)

For simulation scripts: >>> logger = get_logger(‘imputation.simulation.fdgs’)

imputation.logging_config.disable_logging()[source]

Disable logging for the imputation package.

This is useful for testing or when logging output is not desired.

imputation.logging_config.reset_logging()[source]

Reset logging configuration to default state.

This removes all handlers and sets the logger back to default configuration with only a NullHandler.