Skip to contents

This vignette reproduces the real data analysis from the QAIDR paper, applying six interval dimensionality reduction methods to the Cars and Face datasets and evaluating them with co-ranking quality and behavior indices.

Prerequisites: The DR methods require symbolicDA, RSDA, and umap packages. Install them with:

install.packages(c("symbolicDA", "RSDA", "umap"))

Cars Dataset

The Cars dataset contains 27 car models described by 4 interval-valued variables (Price, Engine Capacity, Top Speed, Acceleration) with 4 class labels (Berlina, Luxury, Sportive, Utilitarian).

library(QAIDR)
set.seed(2025)

data(cars_mm)
print(cars_mm)
summary(cars_mm)

Standardize

x <- standardize(cars_mm)

Run all 6 DR methods

proj <- run_idr(x, labels = cars_mm$labels)
print(proj)

2D projection plots

Each DR method produces a 2D projection where intervals are displayed as rectangles:

plot_projections(proj,
                 labels = cars_mm$labels,
                 obs_labels = rownames(cars_mm$centers))

Quality assessment with permutation tests

Evaluate all 6 methods across 4 metrics at neighbourhood size K = 5, with 1000 permutations for significance testing:

result <- assess_quality(x, proj, K = 5,
                         perm_test = TRUE, n_perm = 1000)
print(result)

An asterisk (*) indicates statistical significance at the 0.05 level.

K-neighbourhood profiles

Quality and behavior indices across all neighbourhood sizes K:

profiles <- k_profiles(x, proj)

# Plot for each metric
for (met in c("Int-Euclidean", "Hausdorff", "Ichino-Yaguchi", "Wasserstein")) {
  plot_k_profiles(profiles, metric = met)
}

Face Dataset

The Face dataset contains 27 individuals described by 6 interval-valued anthropometric measurements (AD, BC, AH, DH, EH, GH).

data(facedata_mm)
print(facedata_mm)

Standardize and run DR

x_face <- standardize(facedata_mm)
proj_face <- run_idr(x_face, labels = facedata_mm$labels)

Projection plots

plot_projections(proj_face,
                 labels = facedata_mm$labels,
                 obs_labels = rownames(facedata_mm$centers))

Quality assessment

result_face <- assess_quality(x_face, proj_face, K = 5,
                              perm_test = TRUE, n_perm = 1000)
print(result_face)

K-neighbourhood profiles

profiles_face <- k_profiles(x_face, proj_face)
plot_k_profiles(profiles_face, metric = "Wasserstein")

Interpreting Results

The assessment table reports six indices for each method-metric combination:

Index Type Range Interpretation
Q_TC Quality [0, 1] Average of Trustworthiness and Continuity
B_TC Behavior [-1, 1] Continuity - Trustworthiness (+ = extrusion-dominant)
Q_RE Quality [0, 1] 1 - average relative rank error
B_RE Behavior [-1, 1] Intrusion error - Extrusion error
Q_LC Quality [0, 1] Fraction of K-neighbours preserved
B_LC Behavior [-1, 1] Asymmetry in local continuity

Quality indices closer to 1 indicate better structure preservation. Behavior indices near 0 indicate balanced intrusions and extrusions; large positive or negative values indicate directional bias in the embedding.