Categorical MAR Types¶
- class missmecha.generate.marcat.MARCatType1(missing_rate=0.1, seed=1, cat_column=None)[source]¶
Bases:
object
MAR Mechanism - Categorical Type 1 (Category-Conditioned Row-Wise Masking)
Simulates Missing At Random (MAR) by introducing missingness across rows, conditioned on the value of a categorical feature. Each category is assigned a random masking probability such that the total missingness rate approximately matches missing_rate.
This mechanism is particularly suitable for simulating structured row-wise missingness in tabular data with labeled groups or strata.
- Parameters:
missing_rate (float, default=0.1) – Target total missing rate across the dataset.
seed (int, default=1) – Random seed for reproducibility.
cat_column (str or None) – Name of the categorical column used to drive missingness. If None, a column is randomly selected from the input DataFrame during fit().
- fit(X, y=None)[source]¶
Fit the masking distribution conditioned on a categorical column.
Assigns each category a masking probability proportional to a random draw, normalized to ensure that the total missing rate matches missing_rate.
- Parameters:
X (pd.DataFrame) – Input DataFrame containing the categorical column.
y (Ignored) – Included for interface compatibility.
- Returns:
self – Fitted object with learned class-based probabilities.
- Return type:
- transform(X)[source]¶
Apply row-wise missingness based on category-conditioned probabilities.
For each category in the chosen column, a subset of rows is randomly selected and all columns in those rows are masked (i.e., set to NaN).
- Parameters:
X (pd.DataFrame) – Input DataFrame to transform.
- Returns:
X_missing – DataFrame with row-level missing values introduced.
- Return type:
pd.DataFrame