Preservation: Paths

Kevin R. Coombes and Polina Bombina

Introduction

KRC: SAY SOMETHING

As a first step, we must load the library package.

library(Preservation)

Preparation

Sample Data

We need some data in order to illustrate the preservation metrics. Here we use a subset of the CAVA CD4 T-cell single-cell CITE-Seq data set [REF]. This data set was previously identified as a single coherent cluster in the much larger complete data set. It contains measurements of the expression of 51 proteins in 3096 indiciual cells. We also sub-clustered this subset into four parts to help with visualization.

data("prot5")
dim(prot5)
## [1] 3096   51
table(colvec)
## colvec
## #69DE69 #9FC7FF #C882D9 #FFA4A4 
##     812     596     641    1047

Next, we applied four popular dimension reduction methods:

  1. Multi-Dimensional Scaling (MDS), which is equivalent to principal components analysis when using Euclidean distance.
  2. t-Stochastic Neighbor Embedding (t-SNE).
  3. Uniform Manifold Approximation and Projection (UMAP).
  4. Diffusion Maps (DM).
data("drs")
sapply(drs, dim)
##       MDS TSNE UMAP   DM
## [1,] 3096 3096 3096 3096
## [2,]    2    2    2    2

Here is an initial plot showing the dimension reductions

opar <- par(mfrow = c(2,2))
for (tag in names(drs)) {
  plot(drs[[tag]], col = colvec, pch = 16, main = tag, xlab = "X1", ylab = "X2")
}
Four views of the CAVA subset after dimension reduction.
Four views of the CAVA subset after dimension reduction.
par(opar)

Preservation of Pairwise Distances

hiD <- dist(prot5)
loD <- lapply(drs, dist)
scores <- sapply(c("MilnorDistortion", "SigmaDistortion", "Stress", "SpearmanRho"), function(metric) {
  cat(metric, "\n", file = stderr())
  FUN <- get(metric)
  sapply(names(drs), function(tag) {
    FUN(hiD, loD[[tag]])
  })
})
m1d <- sapply(names(drs), function(tag) {
  M1Distortion(prot5, drs[[tag]])
})
scores <- cbind(scores, M1Distortion = m1d)
round(scores, 3)
##      MilnorDistortion SigmaDistortion Stress SpearmanRho M1Distortion
## MDS             8.353           0.378  0.569       0.608        0.750
## TSNE            8.609           0.455  0.666       0.405        0.853
## UMAP            9.234           0.696  0.497       0.398        0.530
## DM              8.139           0.963  0.983       0.594        1.000
Preservation:::PWBest(colnames(scores))
## MilnorDistortion  SigmaDistortion           Stress      SpearmanRho 
##                0                0                0                1 
##     M1Distortion 
##                0

First Principal Component Path

We then computed a path in high (51-)dimensional space that closely tracked the first principal component. So, in the original space, the path is a close approximation to a line segment that crosses the full data set. The path is represented by 301 individual cells (indicated by their indices as rows in the data set).

data("path-idex")
length(idex)
## [1] 310
highD <- prot5[idex,]
dim(highD)
## [1] 310  51

Here is a plot showing the dimension reductions with the overlaid image of the PC1 path.

opar <- par(mfrow = c(2,2))
for (tag in names(drs)) {
  lowD <- drs[[tag]][idex,]
  plot(drs[[tag]], col = colvec, pch = 16, main = tag, xlab = "X1", ylab = "X2")
  lines(lowD, lwd = 2)
}
Four views of the pth of the first principal components of the CAVA subset after dimension reduction.
Four views of the pth of the first principal components of the CAVA subset after dimension reduction.
par(opar)

Path Preservation Metrics

scores <- sapply(c("LengthDistortion", "SegmentVariance", "Curvature", "SpatialSimilarity"),
                 function(metric) {
  cat(metric, "\n", file = stderr())
  FUN <- get(metric)
  sapply(names(drs), function(tag) {
    lowD <- drs[[tag]][idex,]
    FUN(highD, lowD)
  })
})
round(scores, 3)
##      LengthDistortion SegmentVariance Curvature SpatialSimilarity
## MDS             1.587          -0.520    -0.004            -0.649
## TSNE            1.184          -1.501     0.001            -0.810
## UMAP            0.671          -2.632     0.010            -0.493
## DM              5.128           6.557    -0.021            -1.619
Preservation:::PathBest(colnames(scores))
##  LengthDistortion   SegmentVariance         Curvature SpatialSimilarity 
##                 0                 0                 0                 0

Appendix

This computation was perfomed using the following software tools.

sessionInfo()
## R version 4.6.0 Patched (2026-05-01 r89993)
## Platform: x86_64-pc-linux-gnu
## Running under: Debian GNU/Linux 13 (trixie)
## 
## Matrix products: default
## BLAS:   /srv/R/R-patched/build.26-05-02/lib/libRblas.so 
## LAPACK: /srv/R/R-patched/build.26-05-02/lib/libRlapack.so;  LAPACK version 3.12.1
## 
## locale:
##  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=C              
##  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
##  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
## 
## time zone: Europe/Vienna
## tzcode source: system (glibc)
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
## [1] Preservation_0.0.9
## 
## loaded via a namespace (and not attached):
##  [1] magic_1.6-1                     tidyr_1.3.2                    
##  [3] plotly_4.12.0                   sass_0.4.10                    
##  [5] generics_0.1.4                  digest_0.6.39                  
##  [7] magrittr_2.0.5                  evaluate_1.0.5                 
##  [9] grid_4.6.0                      RColorBrewer_1.1-3             
## [11] fastmap_1.2.0                   jsonlite_2.0.0                 
## [13] DRquality_0.2.1                 DatabionicSwarm_2.0.0          
## [15] promises_1.5.0                  httr_1.4.8                     
## [17] purrr_1.2.2                     viridisLite_0.4.3              
## [19] scales_1.4.0                    GeneralizedUmatrix_1.3.1       
## [21] lazyeval_0.2.3                  jquerylib_0.1.4                
## [23] abind_1.4-8                     cli_3.6.6                      
## [25] shiny_1.13.0                    rlang_1.2.0                    
## [27] emdist_0.3-3                    shinythemes_1.2.0              
## [29] cachem_1.1.0                    yaml_2.3.12                    
## [31] otel_0.2.0                      FNN_1.1.4.1                    
## [33] coRanking_0.2.5                 geometry_0.5.2                 
## [35] tools_4.6.0                     deldir_2.0-4                   
## [37] dplyr_1.2.1                     ggplot2_4.0.3                  
## [39] ProjectionBasedClustering_1.2.2 httpuv_1.6.17                  
## [41] vctrs_0.7.3                     R6_2.6.1                       
## [43] mime_0.13                       lifecycle_1.0.5                
## [45] htmlwidgets_1.6.4               shinyjs_2.1.1                  
## [47] pkgconfig_2.0.3                 RcppParallel_5.1.11-2          
## [49] pillar_1.11.1                   bslib_0.10.0                   
## [51] later_1.4.8                     gtable_0.3.6                   
## [53] glue_1.8.1                      data.table_1.18.2.1            
## [55] Rcpp_1.1.1-1.1                  xfun_0.57                      
## [57] tibble_3.3.1                    tidyselect_1.2.1               
## [59] knitr_1.51                      farver_2.1.2                   
## [61] xtable_1.8-8                    htmltools_0.5.9                
## [63] rmarkdown_2.31                  compiler_4.6.0                 
## [65] S7_0.2.2