Generate Module¶
This module defines the interface and mechanism functions for simulating missing values.
The main class, MissMechaGenerator
, serves as a flexible controller that supports different missing data mechanisms.
Each mechanism (MCAR, MAR, MNAR) corresponds to a specific pattern of missingness and can be applied independently or in combination.
MissMechaGenerator
¶
The main interface to generate missingness patterns.
- class missmecha.generator.MissMechaGenerator(mechanism='MCAR', mechanism_type=1, missing_rate=0.2, seed=1, info=None, cat_cols=None, custom_class=None)[source]¶
Bases:
object
Flexible simulator for generating missing data under various mechanisms.
This class serves as the central interface for simulating missing values using various predefined mechanisms (e.g., MCAR, MAR, MNAR), or user-defined custom mechanisms. It supports both global and column-wise missingness simulation, enabling fine-grained control over which features to mask and how.
- Parameters:
mechanism (str, default="MCAR") – The default missingness mechanism to use if info is not specified. Can be one of {“mcar”, “mar”, “mnar”, “custom”}.
mechanism_type (int, default=1) – The subtype of the mechanism (e.g., MAR type 1, MNAR type 4). Ignored if mechanism=”custom”.
missing_rate (float, default=0.2) – Proportion of values to mask as missing (only used in global simulation or if column-level info does not override).
seed (int, default=1) – Random seed to ensure reproducibility.
info (dict, optional) –
Dictionary defining per-column missingness settings. Each key is a column or tuple of columns, and each value is a dict with the following fields:
- ’mechanism’str or type
One of {“mcar”, “mar”, “mnar”, “custom”} or directly a class.
- ’type’int (optional)
Subtype index for predefined mechanisms.
- ’custom_class’class (optional)
A user-defined class implementing .fit(X) and .transform(X). Required if ‘mechanism’ is “custom”.
- ’rate’float
Proportion of values to mask in the column(s).
- ’depend_on’str or list (optional)
Dependency columns for MAR or MNAR patterns.
- ’para’dict (optional)
Additional keyword arguments passed to the mechanism constructor.
cat_cols (list of str, optional) – Columns treated as categorical variables. Values will be internally encoded into integers during simulation, then mapped back to original values.
custom_class (class, optional) – A user-defined mechanism class to use in global simulation when mechanism=”custom”. Must implement fit(X) and transform(X) methods.
Examples
>>> from missmecha.generator import MissMechaGenerator >>> import numpy as np >>> X = np.random.rand(100, 5) >>> generator = MissMechaGenerator(mechanism="mcar", mechanism_type=1, missing_rate=0.2) >>> X_missing = generator.fit_transform(X)
- fit(X, y=None)[source]¶
Fit the internal generators to the input dataset.
This step prepares the missingness generators based on either global or column-specific configurations.
- Parameters:
X (pd.DataFrame or np.ndarray) – The complete input dataset.
y (array-like, optional) – Label or target data (used for some MNAR or MAR configurations).
- Returns:
self – Returns the fitted generator instance.
- Return type:
- fit_transform(X, y=None)[source]¶
Fit the generator and apply the transformation in a single step.
- Parameters:
X (pd.DataFrame or np.ndarray) – The complete input dataset.
y (array-like, optional) – Label or target data (used for some MNAR or MAR configurations).
- Returns:
X_masked – Dataset with simulated missing values.
- Return type:
same type as X
- get_bool_mask()[source]¶
Return the latest boolean mask generated by transform().
- Returns:
bool_mask – Boolean array where True = observed, False = missing.
- Return type:
np.ndarray
Mechanism Functions¶
This section provides details on each missing data mechanism supported by MissMechaGenerator
.
You can explore MCAR, MAR, and MNAR function implementations individually below.
Custom Mechanisms¶
In addition to the predefined mechanisms (MCAR, MAR, MNAR),
MissMechaGenerator
also supports custom missingness mechanisms
defined entirely by the user.
This is useful when you want to simulate structured patterns (e.g., top-k thresholds, model-based missingness, domain-specific logic).
Usage Options¶
You can inject a custom mechanism in two ways:
Global mode: using
mechanism="custom"
and passing your class viacustom_class=...
.Column-wise mode: via the
info
dictionary, per-column control is supported.
Your custom class must implement:
class MyMasker:
def fit(self, X, y=None): ...
def transform(self, X): ...
Global example:
gen = MissMechaGenerator(
mechanism="custom",
custom_class=MyMasker,
missing_rate=0.3
)
Column-wise example:
info = {
"col1": {
"mechanism": "custom",
"custom_class": MyMasker,
"rate": 0.2
}
}
gen = MissMechaGenerator(info=info)
Note
Your custom mechanism must implement both fit(X, y=None)
and transform(X)
.
You can also pass additional parameters through para
if needed.
See also: Custom Missing Mechanisms