KRC: SAY SOMETHING
As a first step, we must load the library package.
We need some data in order to illustrate the preservation metrics. Here we use a subset of the CAVA CD4 T-cell single-cell CITE-Seq data set [REF]. This data set was previously identified as a single coherent cluster in the much larger complete data set. It contains measurements of the expression of 51 proteins in 3096 indiciual cells. We also sub-clustered this subset into four parts to help with visualization.
## [1] 3096 51
## colvec
## #69DE69 #9FC7FF #C882D9 #FFA4A4
## 812 596 641 1047
Next, we applied four popular dimension reduction methods:
## MDS TSNE UMAP DM
## [1,] 3096 3096 3096 3096
## [2,] 2 2 2 2
Here is an initial plot showing the dimension reductions
opar <- par(mfrow = c(2,2))
for (tag in names(drs)) {
plot(drs[[tag]], col = colvec, pch = 16, main = tag, xlab = "X1", ylab = "X2")
}hiD <- dist(prot5)
loD <- lapply(drs, dist)
scores <- sapply(c("MilnorDistortion", "SigmaDistortion", "Stress", "SpearmanRho"), function(metric) {
cat(metric, "\n", file = stderr())
FUN <- get(metric)
sapply(names(drs), function(tag) {
FUN(hiD, loD[[tag]])
})
})
m1d <- sapply(names(drs), function(tag) {
M1Distortion(prot5, drs[[tag]])
})
scores <- cbind(scores, M1Distortion = m1d)
round(scores, 3)## MilnorDistortion SigmaDistortion Stress SpearmanRho M1Distortion
## MDS 8.353 0.378 0.569 0.608 0.750
## TSNE 8.609 0.455 0.666 0.405 0.853
## UMAP 9.234 0.696 0.497 0.398 0.530
## DM 8.139 0.963 0.983 0.594 1.000
## MilnorDistortion SigmaDistortion Stress SpearmanRho
## 0 0 0 1
## M1Distortion
## 0
We then computed a path in high (51-)dimensional space that closely tracked the first principal component. So, in the original space, the path is a close approximation to a line segment that crosses the full data set. The path is represented by 301 individual cells (indicated by their indices as rows in the data set).
## [1] 310
## [1] 310 51
Here is a plot showing the dimension reductions with the overlaid image of the PC1 path.
opar <- par(mfrow = c(2,2))
for (tag in names(drs)) {
lowD <- drs[[tag]][idex,]
plot(drs[[tag]], col = colvec, pch = 16, main = tag, xlab = "X1", ylab = "X2")
lines(lowD, lwd = 2)
}scores <- sapply(c("LengthDistortion", "SegmentVariance", "Curvature", "SpatialSimilarity"),
function(metric) {
cat(metric, "\n", file = stderr())
FUN <- get(metric)
sapply(names(drs), function(tag) {
lowD <- drs[[tag]][idex,]
FUN(highD, lowD)
})
})
round(scores, 3)## LengthDistortion SegmentVariance Curvature SpatialSimilarity
## MDS 1.587 -0.520 -0.004 -0.649
## TSNE 1.184 -1.501 0.001 -0.810
## UMAP 0.671 -2.632 0.010 -0.493
## DM 5.128 6.557 -0.021 -1.619
## LengthDistortion SegmentVariance Curvature SpatialSimilarity
## 0 0 0 0
This computation was perfomed using the following software tools.
## R version 4.6.0 Patched (2026-05-01 r89993)
## Platform: x86_64-pc-linux-gnu
## Running under: Debian GNU/Linux 13 (trixie)
##
## Matrix products: default
## BLAS: /srv/R/R-patched/build.26-05-02/lib/libRblas.so
## LAPACK: /srv/R/R-patched/build.26-05-02/lib/libRlapack.so; LAPACK version 3.12.1
##
## locale:
## [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=en_US.UTF-8 LC_COLLATE=C
## [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
## [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
##
## time zone: Europe/Vienna
## tzcode source: system (glibc)
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] Preservation_0.0.9
##
## loaded via a namespace (and not attached):
## [1] magic_1.6-1 tidyr_1.3.2
## [3] plotly_4.12.0 sass_0.4.10
## [5] generics_0.1.4 digest_0.6.39
## [7] magrittr_2.0.5 evaluate_1.0.5
## [9] grid_4.6.0 RColorBrewer_1.1-3
## [11] fastmap_1.2.0 jsonlite_2.0.0
## [13] DRquality_0.2.1 DatabionicSwarm_2.0.0
## [15] promises_1.5.0 httr_1.4.8
## [17] purrr_1.2.2 viridisLite_0.4.3
## [19] scales_1.4.0 GeneralizedUmatrix_1.3.1
## [21] lazyeval_0.2.3 jquerylib_0.1.4
## [23] abind_1.4-8 cli_3.6.6
## [25] shiny_1.13.0 rlang_1.2.0
## [27] emdist_0.3-3 shinythemes_1.2.0
## [29] cachem_1.1.0 yaml_2.3.12
## [31] otel_0.2.0 FNN_1.1.4.1
## [33] coRanking_0.2.5 geometry_0.5.2
## [35] tools_4.6.0 deldir_2.0-4
## [37] dplyr_1.2.1 ggplot2_4.0.3
## [39] ProjectionBasedClustering_1.2.2 httpuv_1.6.17
## [41] vctrs_0.7.3 R6_2.6.1
## [43] mime_0.13 lifecycle_1.0.5
## [45] htmlwidgets_1.6.4 shinyjs_2.1.1
## [47] pkgconfig_2.0.3 RcppParallel_5.1.11-2
## [49] pillar_1.11.1 bslib_0.10.0
## [51] later_1.4.8 gtable_0.3.6
## [53] glue_1.8.1 data.table_1.18.2.1
## [55] Rcpp_1.1.1-1.1 xfun_0.57
## [57] tibble_3.3.1 tidyselect_1.2.1
## [59] knitr_1.51 farver_2.1.2
## [61] xtable_1.8-8 htmltools_0.5.9
## [63] rmarkdown_2.31 compiler_4.6.0
## [65] S7_0.2.2