Visual Demo

This notebook demonstrates how to use MissMecha’s visualization module to inspect missingness patterns in data.

We will:

  • Visualize missing data matrices

  • Explore nullity (missingness) correlation heatmaps

  • Apply visualizations to both tabular and time series datasets

Import Required Libraries

[1]:
import pandas as pd
import numpy as np
from missmecha.visual import plot_missing_matrix, plot_missing_heatmap
from missmecha.generator import MissMechaGenerator

Matrix Visualization (Simple Tabular Example)

We first create a small DataFrame and manually introduce missingness.

[2]:
np.random.seed(42)

data = {
    'age': [25, 30, 28, 40, 22, 35, 32, 26, 27, 38],
    'income': [50000, 60000, 58000, 61000, 52000, 59000, 57000, 56000, 62000, 54000],
    'gender': ['M', 'F', 'M', 'F', 'F', 'M', 'F', 'M', 'F', 'F']
}

df = pd.DataFrame(data)

# Introduce MAR pattern (hide age when income > 58000)
df.loc[df['income'] > 58000, 'age'] = np.nan

df.head()
[2]:
age income gender
0 25.0 50000 M
1 NaN 60000 F
2 28.0 58000 M
3 NaN 61000 F
4 22.0 52000 F

Binary Missingness Matrix

[3]:
plot_missing_matrix(df, color=False)
../_images/notebooks_MissMecha-Demo-Visual_5_0.png
  • White = missing value

  • Black = observed value

  • Column-wise missing rates are annotated

Value-Colored Missingness Matrix

[4]:
plot_missing_matrix(df, sort_by = "income")
../_images/notebooks_MissMecha-Demo-Visual_7_0.png

This version colors by the magnitude of values, helping to identify patterns between missingness and feature values.


Time Series Missingness Visualization

We now create a time-indexed dataset with artificial missingness.

[5]:
# Create normal values
values = np.random.normal(loc=50, scale=10, size=(20, 10))

# Apply missingness via MissMechaGenerator
mecha = MissMechaGenerator()
X_missing = mecha.fit_transform(values)

df_ts = pd.DataFrame(X_missing, index=pd.date_range('1/1/2011', periods=20, freq='D').strftime('%Y-%m-%d'))

df_ts.head()
[5]:
0 1 2 3 4 5 6 7 8 9
2011-01-01 54.967142 48.617357 NaN 65.230299 47.658466 47.658630 65.792128 57.674347 45.305256 NaN
2011-01-02 45.365823 45.342702 52.419623 30.867198 32.750822 44.377125 NaN 53.142473 40.919759 35.876963
2011-01-03 64.656488 47.742237 50.675282 35.752518 44.556173 51.109226 38.490064 53.756980 NaN 47.083063
2011-01-04 43.982934 NaN 49.865028 39.422891 58.225449 37.791564 NaN 30.403299 36.718140 NaN
2011-01-05 57.384666 51.713683 48.843517 46.988963 35.214780 42.801558 45.393612 60.571222 NaN 32.369598

Time Series Matrix Plot

[6]:
plot_missing_matrix(df_ts,ts = True,figsize=(10, 8))
../_images/notebooks_MissMecha-Demo-Visual_11_0.png

Here only the first and last row indices are shown, designed for cleaner time series views.


Nullity Correlation Heatmap

Create Mixed Dataset

[7]:
data_mixed = pd.DataFrame({
    "DATE": ["09/10/2016", "03/31/2016", "03/16/2016", "04/01/2016", np.nan, "04/03/2016"],
    "TIME": ["12:09:00", "22:10:00", "14:58:00", np.nan, "08:30:00", "19:00:00"],
    "BOROUGH": ["QUEENS", "BROOKLYN", "MANHATTAN", "QUEENS", "BRONX", np.nan],
    "ZIP CODE": ["11427", "11223", "10001", "11434", np.nan, "10010"],
    "LATITUDE": [40.724692, 40.598761, 40.712776, np.nan, 40.850000, 40.755000],
    "LONGITUDE": [-73.874245, -73.987843, -74.006058, -73.900000, -73.880000, np.nan],
    "VEHICLE TYPE": ["BICYCLE", "PASSENGER VEHICLE", "TAXI", "SUV", np.nan, "BICYCLE"]
})

data_mixed.head()
[7]:
DATE TIME BOROUGH ZIP CODE LATITUDE LONGITUDE VEHICLE TYPE
0 09/10/2016 12:09:00 QUEENS 11427 40.724692 -73.874245 BICYCLE
1 03/31/2016 22:10:00 BROOKLYN 11223 40.598761 -73.987843 PASSENGER VEHICLE
2 03/16/2016 14:58:00 MANHATTAN 10001 40.712776 -74.006058 TAXI
3 04/01/2016 NaN QUEENS 11434 NaN -73.900000 SUV
4 NaN 08:30:00 BRONX NaN 40.850000 -73.880000 NaN

Nullity Correlation Heatmap

[8]:
plot_missing_heatmap(data_mixed, method = "kendall", figsize=(10, 6))
../_images/notebooks_MissMecha-Demo-Visual_15_0.png

This shows pairwise relationships between missingness patterns:

  • Positive correlation: variables tend to be missing together

  • Negative correlation: missingness in one variable implies observed values in another


Key Takeaways

  • plot_missing_matrix() visualizes missingness across datasets, in both binary and color-encoded modes.

  • plot_missing_heatmap() reveals structural missingness dependencies.

  • MissMecha visualization tools help diagnose missingness mechanisms at a glance.