MCAR Functions

Missing Completely At Random (MCAR) refers to a missing data mechanism in which the probability of missingness is completely independent of both observed and unobserved data.

More formally, given a data matrix \(X\) and a corresponding missingness indicator matrix \(M\), the MCAR assumption implies:

\[P(M \mid X) = P(M)\]

This means that missing values are distributed entirely at random, regardless of the data values themselves. MCAR is the most stringent and rarest assumption, but also the easiest to handle analytically.

In MissMecha, we implement several MCAR variants that can be applied to numerical, categorical, or time series data.

Note

MCAR mechanisms in this module support all data types, including numerical, categorical, and time series inputs.

MCARType1

class missmecha.generate.mcar.MCARType1(missing_rate=0.1, seed=42)[source]

Bases: object

MCAR Mechanism - Type 1 (Uniform Independent Masking)

Randomly masks entries with a uniform probability across the entire dataset. This mechanism applies a global missing rate independently at each cell.

Parameters:
  • missing_rate (float, default=0.1) – The proportion of values to randomly set as missing (0 ≤ missing_rate ≤ 1).

  • seed (int, default=1) – Random seed for reproducibility.

fit(X, y=None)[source]

Placeholder fit method for interface compatibility.

MCARType1 does not require fitting, but this method sets a flag for internal consistency.

transform(X)[source]

Apply MCARType1 transformation to introduce missingness.

Each entry in the dataset has an independent probability of being set to NaN.

Parameters:

X (np.ndarray) – Input array to apply missingness (converted to float).

Returns:

X_missing – The same array with missing values inserted.

Return type:

np.ndarray


MCARType2

class missmecha.generate.mcar.MCARType2(missing_rate=0.1, seed=1)[source]

Bases: object

MCAR Mechanism - Type 2 (Random Cell Selection)

Randomly selects a fixed number of entries based on the overall missing rate, and masks exactly that number of cells across the dataset.

Parameters:
  • missing_rate (float, default=0.1) – The proportion of values to randomly set as missing (0 ≤ missing_rate ≤ 1).

  • seed (int, default=1) – Random seed for reproducibility.

fit(X, y=None)[source]

Placeholder fit method for interface compatibility.

MCARType2 does not require fitting, but this method sets a flag for internal consistency.

transform(X)[source]

Apply MCARType2 transformation to introduce missingness.

Randomly masks a fixed number of values across the entire array, based on the global missing rate.

Parameters:

X (np.ndarray) – Input array to apply missingness (converted to float).

Returns:

X_missing – Array with missing entries randomly inserted.

Return type:

np.ndarray


MCARType3

class missmecha.generate.mcar.MCARType3(missing_rate=0.1, seed=1)[source]

Bases: object

MCAR Mechanism - Type 3 (Column-wise Balanced Missingness)

Applies missingness to each column independently, with approximately equal number of missing entries per column.

Parameters:
  • missing_rate (float, default=0.1) – The total proportion of missing values in the dataset.

  • seed (int, default=1) – Random seed for reproducibility.

fit(X, y=None)[source]

Placeholder fit method for interface compatibility.

MCARType2 does not require fitting, but this method sets a flag for internal consistency.

transform(X)[source]

Apply MCARType3 transformation to introduce missingness.

Ensures that missing values are approximately evenly distributed across columns.

Parameters:

X (np.ndarray) – Input array to apply missingness (converted to float).

Returns:

X_missing – Array with missing values inserted in a column-balanced way.

Return type:

np.ndarray