Custom Mechanism DemoΒΆ
This notebook demonstrates how to use a custom missing data mechanism in MissMecha
.
You will learn how to:
Define a custom masking class
Inject it into
MissMechaGenerator
Visualize the result
[ ]:
# Import required modules
import numpy as np
import pandas as pd
from missmecha.generator import MissMechaGenerator
import matplotlib.pyplot as plt
[15]:
np.random.seed(0)
X = pd.DataFrame({
'feature1': np.random.randn(100),
'feature2': np.random.rand(100) * 10,
'target': np.random.choice([0, 1], size=100)
})
X.head()
[15]:
feature1 | feature2 | target | |
---|---|---|---|
0 | 1.764052 | 4.238550 | 1 |
1 | 0.400157 | 6.063932 | 0 |
2 | 0.978738 | 0.191932 | 0 |
3 | 2.240893 | 3.015748 | 1 |
4 | 1.867558 | 6.601735 | 0 |
[16]:
import numpy as np
from missmecha.generator import MissMechaGenerator
class MyCustomMasker:
"""
A simple custom mechanism that masks the first `missing_rate` proportion of rows.
"""
def __init__(self, missing_rate=0.1, seed=42):
self.missing_rate = missing_rate
self.seed = seed
def fit(self, X, y=None):
self.n_rows = X.shape[0]
return self
def transform(self, X):
X_missing = X.astype(float).copy()
cutoff = int(self.missing_rate * self.n_rows)
X_missing[:cutoff, :] = np.nan
return X_missing
[17]:
# Use MissMechaGenerator with the custom mechanism
gen = MissMechaGenerator(mechanism='custom', custom_class=MyCustomMasker, missing_rate=0.2)
X_missing = gen.fit_transform(X)
compute_missing_rate(X_missing)
Overall missing rate: 20.00%
60 / 300 total values are missing.
Top variables by missing rate:
n_missing | missing_rate (%) | n_unique | dtype | n_total | |
---|---|---|---|---|---|
column | |||||
feature1 | 20 | 20.0 | 80 | float64 | 100 |
feature2 | 20 | 20.0 | 80 | float64 | 100 |
target | 20 | 20.0 | 2 | float64 | 100 |
[17]:
{'report': n_missing missing_rate (%) n_unique dtype n_total
column
feature1 20 20.0 80 float64 100
feature2 20 20.0 80 float64 100
target 20 20.0 2 float64 100,
'overall_missing_rate': 20.0}