Changelog

For a detailed changelog, see CHANGELOG.md in the repository.

Version 0.1.0

Initial Release

Core Features

  • Complete MICE implementation with convergence tracking

  • Five imputation methods: PMM, CART, Random Forest, MIDAS, Sample

  • Rubin’s rules pooling with fraction of missing information (FMI)

  • Formula-based model fitting and analysis

  • Comprehensive input validation

  • Professional logging system

Imputation Methods

  • PMM: Predictive Mean Matching with Bayesian bootstrap

  • CART: Classification and Regression Trees

  • Random Forest: Ensemble method with configurable parameters

  • MIDAS: Distance-aided substitution for small samples

  • Sample: Simple random sampling

Configuration Options

  • Customizable predictor matrices

  • Multiple visit sequence strategies

  • Method-specific parameter tuning

  • Initial imputation methods

  • Flexible method assignment per variable

Diagnostic Tools

  • Convergence diagnostics (chain statistics)

  • Stripplots for observed vs imputed comparison

  • Density plots for distribution comparison

  • Box plots for distribution visualization

  • Missing data pattern visualization

  • XY plots for bivariate relationships

Statistical Analysis

  • Formula-based model specification (Patsy syntax)

  • Automatic pooling using Rubin’s rules

  • Fraction of Missing Information (FMI) calculation

  • Confidence intervals and p-values

  • Degrees of freedom adjustment

Documentation

  • Comprehensive Sphinx documentation

  • User guide with detailed explanations

  • Theory section with mathematical background

  • API reference for all modules

  • Jupyter notebook examples

  • Best practices guide

Testing

  • Extensive test suite with pytest

  • Unit tests for all core functions

  • Integration tests for workflows

  • Coverage tracking

Development

  • MIT License

  • GitHub repository with CI/CD

  • ReadTheDocs integration

  • Development, testing, and documentation dependencies

Contributors

  • Anna-Carolina Haensch

  • The Anh Vu

  • Zhanna Lopuliak

Future Plans

Potential future enhancements (not yet implemented):

  • Additional imputation methods (e.g., lasso, ridge)

  • Parallel processing for large datasets

  • GPU acceleration for random forest

  • More sophisticated predictor matrix algorithms

  • Additional diagnostic plots

  • Integration with scikit-learn pipelines

  • Categorical variable handling improvements

  • Time series imputation methods

Stay tuned for updates!

Reporting Issues

Found a bug or have a feature request?

Open an issue on GitHub.