Analysis Module¶

This module provides utilities to analyze the structure and quality of missing data and its imputations.

Function Overview¶

`compute_missing_rate`	Compute and summarize missingness statistics for each column.
`evaluate_imputation`	Evaluate imputation quality by comparing imputed values to ground truth.
`MCARTest`	A class to perform MCAR (Missing Completely At Random) tests.

Module Reference¶

`compute_missing_rate`¶

Summarize the extent and structure of missing data in a DataFrame or NumPy array.

missmecha.analysis.compute_missing_rate(data, print_summary=True, plot=False)[source]¶

Compute and summarize missingness statistics for each column.

This function calculates the number and percentage of missing values for each column in a dataset, and optionally provides a summary table and barplot.

Parameters:

data (pandas.DataFrame or numpy.ndarray) – The dataset to analyze for missingness. If ndarray, it will be converted to DataFrame.
print_summary (bool, default=True) – If True, prints the overall missing rate and top variables by missing rate.
plot (bool, default=False) – If True, displays a barplot of missing rates per column.

Returns:

result – A dictionary with: - ‘report’ : pandas.DataFrame with per-column missing statistics. - ‘overall_missing_rate’ : float, overall percentage of missing entries.

Return type:

dict

Examples

>>> from missmecha.analysis import compute_missing_rate
>>> df = pd.read_csv("data.csv")
>>> stats = compute_missing_rate(df, print_summary=True, plot=True)

`evaluate_imputation`¶

Evaluate imputation quality by comparing filled values to the ground truth at missing positions.

missmecha.analysis.evaluate_imputation(original_df, imputed_df, mask_array, method='rmse', cat_cols=None)[source]¶

Evaluate imputation quality by comparing imputed values to ground truth.

This function computes per-column and overall evaluation scores based on the positions that were originally missing. It supports mixed-type data by applying different metrics for categorical and numerical columns. Returns both original and scaled (0-1) versions of the evaluation metrics.

Parameters:

original_df (pd.DataFrame) – The fully observed reference dataset (i.e., ground truth).
imputed_df (pd.DataFrame) – The dataset after imputation has been applied.
mask_array (np.ndarray or pd.DataFrame of bool) – Boolean array where True = originally observed, False = originally missing. Usually obtained from MissMechaGenerator.bool_mask.
method (str, default="rmse") – Evaluation method to use for numeric columns. One of {‘rmse’, ‘mae’, ‘accuracy’}.
cat_cols (list of str, optional) – Column names that should be treated as categorical. These will always use accuracy. - If not provided, all columns use the method specified by method.

Returns:

result – Dictionary with two sub-dictionaries: - ‘original’: Contains raw evaluation scores

’column_scores’: mapping from column name to evaluation score

’overall_score’: average of valid column scores (float)

’scaled’: Contains normalized scores (0-1 range)
- ’column_scores’: mapping from column name to scaled evaluation score
- ’overall_score’: average of valid scaled column scores (float)

For categorical columns, the scaled score equals the original accuracy score.

Return type:

dict

Raises:

ValueError – If an unsupported method or column type is used.

Notes

If cat_cols is None: all columns use the selected method.
If cat_cols is provided:
- columns in cat_cols use accuracy
- all other columns use method, which must be ‘rmse’ or ‘mae’
Includes formatted print output.

Examples

>>> from missmecha.analysis import evaluate_imputation
>>> result = evaluate_imputation(X_true, X_filled, mask, method="rmse")

>>> result = evaluate_imputation(
...     original_df=X_true,
...     imputed_df=X_filled,
...     mask_array=mask,
...     method="mae",
...     cat_cols=["gender", "job_type"]
... )
>>> print(result["overall_score"])
0.872

`MCARTest`¶

This class supports two approaches to test the MCAR assumption:

Little’s MCAR Test: a global test for whether the missingness is completely at random.
Pairwise T-Tests: individual tests that compare observed vs. missing groups.

class missmecha.analysis.MCARTest(method: str = 'little')[source]¶

Bases: object

A class to perform MCAR (Missing Completely At Random) tests.

Supports Little’s MCAR test (global test for all variables) and pairwise MCAR t-tests (for individual variables).

static little_mcar_test(X: DataFrame) → float[source]¶

Perform Little’s MCAR test on a DataFrame.

Parameters:: X (pd.DataFrame) – Input dataset.
Returns:: pvalue – P-value of the test.
Return type:: float

static mcar_t_tests(X: DataFrame) → DataFrame[source]¶

Perform pairwise MCAR t-tests between missing and observed groups.

Parameters:: X (pd.DataFrame) – Input dataset.
Returns:: p_matrix – Matrix of p-values (var vs var).
Return type:: pd.DataFrame

static report(pvalue: float, alpha: float = 0.05, method: str = "Little's MCAR Test") → None[source]¶

Print a summary report of the MCAR test.

Parameters:

pvalue (float) – The p-value from the MCAR test.
alpha (float, default=0.05) – Significance level.
method (str, default="Little's MCAR Test") – Method name shown in report.

Analysis Module¶

Function Overview¶

Module Reference¶

compute_missing_rate¶

evaluate_imputation¶

MCARTest¶

`compute_missing_rate`¶

`evaluate_imputation`¶

`MCARTest`¶