Rubin's Rules
=============

How to combine results from multiple imputed datasets.

The Problem
-----------

After imputation, you have *m* estimates: :math:`\hat{\theta}_1, ..., \hat{\theta}_m`

Simple averaging ignores imputation uncertainty. Rubin's rules provide the correct solution.

Basic Formulas
--------------

**Pooled Estimate**

.. math::

   \bar{\theta} = \frac{1}{m}\sum_{i=1}^{m} \hat{\theta}_i

**Within-Imputation Variance** (average sampling variance)

.. math::

   \bar{U} = \frac{1}{m}\sum_{i=1}^{m} SE_i^2

**Between-Imputation Variance** (variance due to missing data)

.. math::

   B = \frac{1}{m-1}\sum_{i=1}^{m} (\hat{\theta}_i - \bar{\theta})^2

**Total Variance**

.. math::

   T = \bar{U} + B + \frac{B}{m}

**Standard Error**

.. math::

   SE = \sqrt{T}

**Confidence Interval**

.. math::

   \bar{\theta} \pm t_{df} \times SE

where :math:`t_{df}` is from t-distribution with adjusted degrees of freedom.

Fraction of Missing Information (FMI)
--------------------------------------

.. math::

   FMI = \frac{(1 + 1/m)B}{T}

**Interpretation**:
   - FMI = 0: No impact from missing data
   - FMI = 0.3: 30% of uncertainty due to missingness
   - FMI > 0.3: Consider more imputations

How Many Imputations?
----------------------

**Old rule**: m = 5
**Modern recommendation**: m = 20+
**High missingness**: m = 50-100

**Rule of thumb**: m ≈ percentage of incomplete cases

Usage in mice-py
----------------

.. code-block:: python

   mice.fit('outcome ~ predictor')
   pooled = mice.pool(summ=True)

Output includes:
   - ``Estimate``: Pooled coefficient
   - ``Std.Error``: Pooled SE
   - ``P>|t|``: p-value
   - ``FMI``: Fraction of missing information

See :doc:`../user_guide/pooling_analysis` for practical usage.