Plotting Functions

Diagnostic and visualization tools for analyzing imputation results and missing data patterns.

Diagnostic Plots

plotting.diagnostics.stripplot(imputed_datasets, missing_pattern, columns=None, merge_imputations=False, observed_color='blue', imputed_color='red', save_path=None)[source]

Create stripplots for imputed data showing observed and imputed values. First plots observed data, then for each imputation shows both observed and imputed values in different colors.

Parameters:
  • imputed_datasets (list of pandas.DataFrame) – List of DataFrames containing imputed values

  • missing_pattern (pandas.DataFrame) – DataFrame indicating missing values (0 where missing, 1 where observed)

  • columns (list of str, optional) – List of column names to plot. If None, plots all columns with missing values.

  • merge_imputations (bool, default False) – If True, shows two columns: one with only observed values and another with observed and imputed values overlaid. If False, shows separate plots for each imputation.

  • observed_color (str, default 'blue') – Color for observed values

  • imputed_color (str, default 'red') – Color for imputed values

  • save_path (str, optional) – If provided, save the plot to this path instead of displaying it

plotting.diagnostics.bwplot(imputed_datasets, missing_pattern, columns=None, merge_imputations=False, observed_color='blue', imputed_color='red', save_path=None)[source]

Create box-and-whisker plots for imputed data showing observed and imputed values. First plots observed data, then for each imputation shows only imputed values in different colors.

Parameters:
  • imputed_datasets (list of pandas.DataFrame) – List of DataFrames containing imputed values

  • missing_pattern (pandas.DataFrame) – DataFrame indicating missing values (0 where missing, 1 where observed)

  • columns (list of str, optional) – List of column names to plot. If None, plots all columns with missing values.

  • merge_imputations (bool, default False) – If True, combines all imputed values into a single boxplot. If False, shows separate boxplots for each imputation.

  • observed_color (str, default 'blue') – Color for observed values

  • imputed_color (str, default 'red') – Color for imputed values

  • save_path (str, optional) – If provided, save the plot to this path instead of displaying it

plotting.diagnostics.densityplot(imputed_datasets, missing_pattern, columns=None, observed_color='blue', imputed_color='red', save_path=None)[source]

Create density plots (KDE) for observed and imputed data. Shows the distribution of observed data in blue and imputed data in red.

Parameters:
  • imputed_datasets (list of pandas.DataFrame) – List of DataFrames containing imputed values

  • missing_pattern (pandas.DataFrame) – DataFrame indicating missing values (0 where missing, 1 where observed)

  • columns (list of str, optional) – List of column names to plot. If None, plots all columns with missing values.

  • observed_color (str, default 'blue') – Color for observed values

  • imputed_color (str, default 'red') – Color for imputed values

  • save_path (str, optional) – If provided, save the plot to this path instead of displaying it

plotting.diagnostics.densityplot_split(imputed_datasets, missing_pattern, column, observed_color='blue', imputed_color='red', save_path=None)[source]

Create separate density plots (KDE) for observed data and each imputed dataset. Shows the distribution of observed data in blue and imputed data in red, with each imputation in a separate subplot.

Parameters:
  • imputed_datasets (list of pandas.DataFrame) – List of DataFrames containing imputed values

  • missing_pattern (pandas.DataFrame) – DataFrame indicating missing values (0 where missing, 1 where observed)

  • column (str) – Name of the column to plot

  • observed_color (str, default 'blue') – Color for observed values

  • imputed_color (str, default 'red') – Color for imputed values

  • save_path (str, optional) – If provided, save the plot to this path instead of displaying it

plotting.diagnostics.xyplot(imputed_datasets, missing_pattern, x, y, merge_imputations=False, observed_color='blue', imputed_color='red', save_path=None)[source]

Create scatter plots of two columns, showing observed and imputed values. Missing data in y is shown in red, observed data in blue.

Parameters:
  • imputed_datasets (list of pandas.DataFrame) – List of DataFrames containing imputed values

  • missing_pattern (pandas.DataFrame) – DataFrame indicating missing values (0 where missing, 1 where observed)

  • x (str) – Name of the column to plot on x-axis

  • y (str) – Name of the column to plot on y-axis

  • merge_imputations (bool, default False) – If True, shows all imputations on a single plot. If False, shows n+1 plots: first plot with only observed data, followed by one plot for each imputation.

  • observed_color (str, default 'blue') – Color for observed values

  • imputed_color (str, default 'red') – Color for imputed values

  • save_path (str, optional) – If provided, save the plot to this path instead of displaying it

plotting.diagnostics.plot_chain_stats(chain_mean, chain_var, columns=None, figsize=(10, 5), save_path=None)[source]

Plot per-iteration chain means and variances for the given columns.

Parameters:
  • chain_mean (Dict[str, np.ndarray]) – Dictionary where each key is a column name and each value is a 2-D array of shape (n_iter, n_imputations) containing the means of the newly imputed values.

  • chain_var (Dict[str, np.ndarray]) – Same structure as chain_mean but for the variance of the imputed values.

  • columns (list of str, optional) – Columns to plot. If None, plots all keys present in chain_mean.

  • figsize (tuple, default (10, 5)) – Base size of a single row (width, height). The final figure will be scaled according to the number of rows.

  • save_path (str, optional) – If provided, save the plot to this path instead of displaying it.

Plotting Utilities

plotting.utils.md_pattern_like(df)[source]

Replicates the md.pattern() behavior from R’s mice package. Shows missing data patterns as 1 (observed) and 0 (missing), counts per pattern and per column.

Parameters:

dfpandas.DataFrame

Input DataFrame with potential missing values

Returns:

pandas.DataFrame

DataFrame showing missing data patterns with counts

plotting.utils.plot_missing_data_pattern(pattern_df, figsize=(8, 5), title='Missing Data Pattern', rotate_names=False, save_path=None)[source]

Plots the missing data pattern from a pattern dataframe.

Parameters:

pattern_dfpandas.DataFrame

DataFrame containing the missing data pattern, typically generated by md_pattern_like()

figsizetuple, optional

Figure size in inches (width, height). Default is (8, 5)

titlestr, optional

Title for the plot. Default is “Missing Data Pattern”

rotate_namesbool, optional

Whether to rotate column names 90 degrees. Default is False

save_pathstr, optional

If provided, save the plot to this path instead of displaying it

Returns:

pandas.DataFrame

The pattern matrix with counts, similar to R’s md.pattern output

See Also