Categorical MAR Types

class missmecha.generate.marcat.MARCatType1(missing_rate=0.1, seed=1, cat_column=None)[source]

Bases: object

MAR Mechanism - Categorical Type 1 (Category-Conditioned Row-Wise Masking)

Simulates Missing At Random (MAR) by introducing missingness across rows, conditioned on the value of a categorical feature. Each category is assigned a random masking probability such that the total missingness rate approximately matches missing_rate.

This mechanism is particularly suitable for simulating structured row-wise missingness in tabular data with labeled groups or strata.

Parameters:
  • missing_rate (float, default=0.1) – Target total missing rate across the dataset.

  • seed (int, default=1) – Random seed for reproducibility.

  • cat_column (str or None) – Name of the categorical column used to drive missingness. If None, a column is randomly selected from the input DataFrame during fit().

fit(X, y=None)[source]

Fit the masking distribution conditioned on a categorical column.

Assigns each category a masking probability proportional to a random draw, normalized to ensure that the total missing rate matches missing_rate.

Parameters:
  • X (pd.DataFrame) – Input DataFrame containing the categorical column.

  • y (Ignored) – Included for interface compatibility.

Returns:

self – Fitted object with learned class-based probabilities.

Return type:

MARCatType1

transform(X)[source]

Apply row-wise missingness based on category-conditioned probabilities.

For each category in the chosen column, a subset of rows is randomly selected and all columns in those rows are masked (i.e., set to NaN).

Parameters:

X (pd.DataFrame) – Input DataFrame to transform.

Returns:

X_missing – DataFrame with row-level missing values introduced.

Return type:

pd.DataFrame