Plotting Functions
Diagnostic and visualization tools for analyzing imputation results and missing data patterns.
Diagnostic Plots
- plotting.diagnostics.stripplot(imputed_datasets, missing_pattern, columns=None, merge_imputations=False, observed_color='blue', imputed_color='red', save_path=None)[source]
Create stripplots for imputed data showing observed and imputed values. First plots observed data, then for each imputation shows both observed and imputed values in different colors.
- Parameters:
imputed_datasets (list of pandas.DataFrame) – List of DataFrames containing imputed values
missing_pattern (pandas.DataFrame) – DataFrame indicating missing values (0 where missing, 1 where observed)
columns (list of str, optional) – List of column names to plot. If None, plots all columns with missing values.
merge_imputations (bool, default False) – If True, shows two columns: one with only observed values and another with observed and imputed values overlaid. If False, shows separate plots for each imputation.
observed_color (str, default 'blue') – Color for observed values
imputed_color (str, default 'red') – Color for imputed values
save_path (str, optional) – If provided, save the plot to this path instead of displaying it
- plotting.diagnostics.bwplot(imputed_datasets, missing_pattern, columns=None, merge_imputations=False, observed_color='blue', imputed_color='red', save_path=None)[source]
Create box-and-whisker plots for imputed data showing observed and imputed values. First plots observed data, then for each imputation shows only imputed values in different colors.
- Parameters:
imputed_datasets (list of pandas.DataFrame) – List of DataFrames containing imputed values
missing_pattern (pandas.DataFrame) – DataFrame indicating missing values (0 where missing, 1 where observed)
columns (list of str, optional) – List of column names to plot. If None, plots all columns with missing values.
merge_imputations (bool, default False) – If True, combines all imputed values into a single boxplot. If False, shows separate boxplots for each imputation.
observed_color (str, default 'blue') – Color for observed values
imputed_color (str, default 'red') – Color for imputed values
save_path (str, optional) – If provided, save the plot to this path instead of displaying it
- plotting.diagnostics.densityplot(imputed_datasets, missing_pattern, columns=None, observed_color='blue', imputed_color='red', save_path=None)[source]
Create density plots (KDE) for observed and imputed data. Shows the distribution of observed data in blue and imputed data in red.
- Parameters:
imputed_datasets (list of pandas.DataFrame) – List of DataFrames containing imputed values
missing_pattern (pandas.DataFrame) – DataFrame indicating missing values (0 where missing, 1 where observed)
columns (list of str, optional) – List of column names to plot. If None, plots all columns with missing values.
observed_color (str, default 'blue') – Color for observed values
imputed_color (str, default 'red') – Color for imputed values
save_path (str, optional) – If provided, save the plot to this path instead of displaying it
- plotting.diagnostics.densityplot_split(imputed_datasets, missing_pattern, column, observed_color='blue', imputed_color='red', save_path=None)[source]
Create separate density plots (KDE) for observed data and each imputed dataset. Shows the distribution of observed data in blue and imputed data in red, with each imputation in a separate subplot.
- Parameters:
imputed_datasets (list of pandas.DataFrame) – List of DataFrames containing imputed values
missing_pattern (pandas.DataFrame) – DataFrame indicating missing values (0 where missing, 1 where observed)
column (str) – Name of the column to plot
observed_color (str, default 'blue') – Color for observed values
imputed_color (str, default 'red') – Color for imputed values
save_path (str, optional) – If provided, save the plot to this path instead of displaying it
- plotting.diagnostics.xyplot(imputed_datasets, missing_pattern, x, y, merge_imputations=False, observed_color='blue', imputed_color='red', save_path=None)[source]
Create scatter plots of two columns, showing observed and imputed values. Missing data in y is shown in red, observed data in blue.
- Parameters:
imputed_datasets (list of pandas.DataFrame) – List of DataFrames containing imputed values
missing_pattern (pandas.DataFrame) – DataFrame indicating missing values (0 where missing, 1 where observed)
x (str) – Name of the column to plot on x-axis
y (str) – Name of the column to plot on y-axis
merge_imputations (bool, default False) – If True, shows all imputations on a single plot. If False, shows n+1 plots: first plot with only observed data, followed by one plot for each imputation.
observed_color (str, default 'blue') – Color for observed values
imputed_color (str, default 'red') – Color for imputed values
save_path (str, optional) – If provided, save the plot to this path instead of displaying it
- plotting.diagnostics.plot_chain_stats(chain_mean, chain_var, columns=None, figsize=(10, 5), save_path=None)[source]
Plot per-iteration chain means and variances for the given columns.
- Parameters:
chain_mean (Dict[str, np.ndarray]) – Dictionary where each key is a column name and each value is a 2-D array of shape (n_iter, n_imputations) containing the means of the newly imputed values.
chain_var (Dict[str, np.ndarray]) – Same structure as chain_mean but for the variance of the imputed values.
columns (list of str, optional) – Columns to plot. If None, plots all keys present in chain_mean.
figsize (tuple, default (10, 5)) – Base size of a single row (width, height). The final figure will be scaled according to the number of rows.
save_path (str, optional) – If provided, save the plot to this path instead of displaying it.
Plotting Utilities
- plotting.utils.md_pattern_like(df)[source]
Replicates the md.pattern() behavior from R’s mice package. Shows missing data patterns as 1 (observed) and 0 (missing), counts per pattern and per column.
Parameters:
- dfpandas.DataFrame
Input DataFrame with potential missing values
Returns:
- pandas.DataFrame
DataFrame showing missing data patterns with counts
- plotting.utils.plot_missing_data_pattern(pattern_df, figsize=(8, 5), title='Missing Data Pattern', rotate_names=False, save_path=None)[source]
Plots the missing data pattern from a pattern dataframe.
Parameters:
- pattern_dfpandas.DataFrame
DataFrame containing the missing data pattern, typically generated by md_pattern_like()
- figsizetuple, optional
Figure size in inches (width, height). Default is (8, 5)
- titlestr, optional
Title for the plot. Default is “Missing Data Pattern”
- rotate_namesbool, optional
Whether to rotate column names 90 degrees. Default is False
- save_pathstr, optional
If provided, save the plot to this path instead of displaying it
Returns:
- pandas.DataFrame
The pattern matrix with counts, similar to R’s md.pattern output
See Also
Convergence Diagnostics for guidance on checking convergence
Examples for plotting examples