Quickstart Guide
================

This guide will get you started with mice-py in just a few minutes.

Basic Workflow
--------------

The typical MICE workflow consists of three main steps:

1. **Initialize** a MICE object with your data
2. **Impute** missing values multiple times
3. **Analyze** the imputed datasets and pool results

Minimal Example
---------------

Here's a complete example using the NHANES dataset:

.. code-block:: python

   import pandas as pd
   import numpy as np
   from imputation import MICE
   
   # 1. Load data with missing values
   df = pd.DataFrame({
       'age': [25, 30, np.nan, 45, 50, np.nan, 35, 40],
       'income': [50000, np.nan, 60000, np.nan, 80000, 70000, np.nan, 75000],
       'education': ['Bachelor', 'Master', 'Bachelor', np.nan, 
                     'PhD', 'Master', 'Bachelor', np.nan],
       'employed': [1, 1, 0, 1, 1, np.nan, 1, 0]
   })
   
   # 2. Initialize MICE object
   mice = MICE(df)
   
   # 3. Perform imputation
   mice.impute(
       n_imputations=5,    # Create 5 imputed datasets
       maxit=10,           # Run 10 iterations
       method='pmm'        # Use Predictive Mean Matching
   )
   
   # 4. Access imputed datasets
   imputed_datasets = mice.imputed_datasets
   print(f"Created {len(imputed_datasets)} complete datasets")
   
   # 5. Fit a statistical model
   mice.fit('income ~ age + education + employed')
   
   # 6. Pool results using Rubin's rules
   pooled_results = mice.pool(summ=True)
   print(pooled_results)

Understanding the Output
------------------------

After imputation, you'll have:

**Multiple Complete Datasets**
   The ``mice.imputed_datasets`` attribute contains a list of pandas DataFrames, 
   each with all missing values filled in differently.

**Convergence Diagnostics**
   - ``mice.chain_mean``: Mean of each variable across iterations
   - ``mice.chain_var``: Variance of each variable across iterations

**Pooled Results**
   When you call ``mice.pool()``, you get combined estimates from all imputed datasets
   using Rubin's rules, including:
   
   - Pooled coefficients
   - Standard errors
   - Confidence intervals
   - Fraction of missing information (FMI)

Checking for Convergence
-------------------------

Before analyzing results, check if the imputation converged:

.. code-block:: python

   from plotting.diagnostics import plot_chain_stats
   
   # Visualize convergence
   plot_chain_stats(
       chain_mean=mice.chain_mean,
       chain_var=mice.chain_var,
       save_path='convergence.png'
   )

The chains should stabilize after a few iterations. If they haven't, increase ``maxit``.

Visualizing Imputations
-----------------------

Compare observed and imputed values:

.. code-block:: python

   from plotting.diagnostics import stripplot, densityplot
   
   # Create missing pattern indicator
   missing_pattern = df.notna().astype(int)
   
   # Stripplot: points for observed (blue) and imputed (red) values
   stripplot(mice.imputed_datasets, missing_pattern, 
             save_path='stripplot.png')
   
   # Density plot: distribution comparison
   densityplot(mice.imputed_datasets, missing_pattern,
               save_path='density.png')

Using Different Methods
-----------------------

PMM (Default)
~~~~~~~~~~~~~

Predictive Mean Matching is the default method and works well for most numeric data:

.. code-block:: python

   mice.impute(n_imputations=5, method='pmm')

CART
~~~~

Classification and Regression Trees handle non-linear relationships:

.. code-block:: python

   mice.impute(n_imputations=5, method='cart')

Random Forest
~~~~~~~~~~~~~

Random Forest captures complex interactions:

.. code-block:: python

   mice.impute(n_imputations=5, method='rf')

Method Per Variable
~~~~~~~~~~~~~~~~~~~

Use different methods for different variables:

.. code-block:: python

   method_dict = {
       'age': 'pmm',
       'income': 'cart',
       'education': 'sample',
       'employed': 'rf'
   }
   mice.impute(n_imputations=5, method=method_dict)

Logging
-------

Enable logging to track progress:

.. code-block:: python

   from imputation import configure_logging
   
   # Enable INFO level logging
   configure_logging(level='INFO')
   
   # Now run MICE - you'll see progress messages
   mice = MICE(df)
   mice.impute(n_imputations=5, maxit=10)

Common Parameters
-----------------

Here are the most commonly used parameters:

**n_imputations** (default: 5)
   Number of imputed datasets to create. More datasets provide more accurate 
   pooled estimates but take longer to compute.

**maxit** (default: 10)
   Number of MICE iterations. Check convergence diagnostics to determine if 
   more iterations are needed.

**method** (default: 'pmm')
   Imputation method. Can be a string (same method for all variables) or 
   a dictionary mapping column names to methods.

**initial** (default: 'sample')
   Method for initial imputation before MICE iterations. Options: 'sample' 
   or 'mean'.

**visit_sequence** (default: 'monotone')
   Order in which variables are imputed. Options: 'monotone', 'random', or 
   a custom list.

Next Steps
----------

Now that you understand the basics:

- **Explore methods**: Read :doc:`user_guide/imputation_methods` to choose 
  the best method for your data
  
- **Advanced parameters**: Learn about predictor matrices and visit sequences 
  in :doc:`user_guide/predictor_matrices`
  
- **Theory**: Understand the theory behind MICE in :doc:`theory/index`

- **Examples**: See complete workflows in :doc:`examples/index`