Preservation: Paths

Kevin R. Coombes and Polina Bombina

Introduction

KRC: SAY SOMETHING

As a first step, we must load the library package.

library(Preservation)

Preparation

Sample Data

We need some data in order to illustrate the preservation metrics. Here we use a subset of the CAVA CD4 T-cell single-cell CITE-Seq data set [REF]. This data set was previously identified as a single coherent cluster in the much larger complete data set. It contains measurements of the expression of 51 proteins in 3096 indiciual cells. We also sub-clustered this subset into four parts to help with visualization.

data("prot5")
dim(prot5)
## [1] 3096   51
table(colvec)
## colvec
## #69DE69 #9FC7FF #C882D9 #FFA4A4 
##     812     596     641    1047

First Principal Component Path

We then computed a path in high (51-)dimensional space that closely tracked the first principal component. So, in the original space, the path is a close approximation to a line segment that crosses the full data set. The path is represented by 301 individual cells (indicated by their indices as rows in the data set).

data("path-idex")
length(idex)
## [1] 310
highD <- prot5[idex,]
dim(highD)
## [1] 310  51

Finally, we applied four popular dimension reduction methods:

  1. Multi-Dimensional Scaling (MDS), which is equivalent to principal components analysis when using Euclidean distance.
  2. t-Stochastic Neighbor Embedding (t-SNE).
  3. Uniform Manifold Approximation and Projection (UMAP).
  4. Diffusion Maps (DM).
data("drs")
sapply(drs, dim)
##       MDS TSNE UMAP   DM
## [1,] 3096 3096 3096 3096
## [2,]    2    2    2    2

Here is an initial plot showing the dimension reductions

opar <- par(mfrow = c(2,2))
for (tag in names(drs)) {
  lowD <- drs[[tag]][idex,]
  plot(drs[[tag]], col = colvec, pch = 16, main = tag, xlab = "X1", ylab = "X2")
  lines(lowD, lwd = 2)
}
Blah.
Blah.
par(opar)

Path Preservation Metrics

sapply(names(drs), function(tag) {
  lowD <- drs[[tag]][idex,]
  LengthDistortion(highD, lowD)
})
##       MDS      TSNE      UMAP        DM 
## 1.5865282 1.1835661 0.6709129 5.1283261
sapply(names(drs), function(tag) {
  lowD <- drs[[tag]][idex,]
  SpatialSimilarity(highD, lowD)
})
##        MDS       TSNE       UMAP         DM 
## -0.6490680 -0.8099849 -0.4926579 -1.6185589
sapply(names(drs), function(tag) {
  lowD <- drs[[tag]][idex,]
  SegmentVariance(highD, lowD)
})
##        MDS       TSNE       UMAP         DM 
## -0.5201157 -1.5013906 -2.6315984  6.5566507

OOPS

sapply(names(drs), function(tag) {
  lowD <- drs[[tag]][idex,]
  Curvature(highD, lowD)
})
##           MDS          TSNE          UMAP            DM 
## -0.0036972119  0.0009007236  0.0096348650 -0.0211827070

Appendix

This computation was perfomed using the following software tools.

sessionInfo()
## R version 4.6.0 Patched (2026-04-28 r89974)
## Platform: x86_64-pc-linux-gnu
## Running under: Debian GNU/Linux 13 (trixie)
## 
## Matrix products: default
## BLAS:   /srv/R/R-patched/build.26-04-30/lib/libRblas.so 
## LAPACK: /srv/R/R-patched/build.26-04-30/lib/libRlapack.so;  LAPACK version 3.12.1
## 
## locale:
##  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=C              
##  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
##  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
## 
## time zone: Europe/Vienna
## tzcode source: system (glibc)
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
## [1] Preservation_0.0.7
## 
## loaded via a namespace (and not attached):
##  [1] magic_1.6-1                     tidyr_1.3.2                    
##  [3] plotly_4.12.0                   sass_0.4.10                    
##  [5] generics_0.1.4                  digest_0.6.39                  
##  [7] magrittr_2.0.5                  evaluate_1.0.5                 
##  [9] grid_4.6.0                      RColorBrewer_1.1-3             
## [11] fastmap_1.2.0                   jsonlite_2.0.0                 
## [13] DRquality_0.2.1                 DatabionicSwarm_2.0.0          
## [15] promises_1.5.0                  httr_1.4.8                     
## [17] purrr_1.2.2                     viridisLite_0.4.3              
## [19] scales_1.4.0                    GeneralizedUmatrix_1.3.1       
## [21] lazyeval_0.2.3                  jquerylib_0.1.4                
## [23] abind_1.4-8                     cli_3.6.6                      
## [25] shiny_1.13.0                    rlang_1.2.0                    
## [27] emdist_0.3-3                    shinythemes_1.2.0              
## [29] cachem_1.1.0                    yaml_2.3.12                    
## [31] otel_0.2.0                      FNN_1.1.4.1                    
## [33] coRanking_0.2.5                 geometry_0.5.2                 
## [35] tools_4.6.0                     deldir_2.0-4                   
## [37] dplyr_1.2.1                     ggplot2_4.0.3                  
## [39] ProjectionBasedClustering_1.2.2 httpuv_1.6.17                  
## [41] vctrs_0.7.3                     R6_2.6.1                       
## [43] mime_0.13                       lifecycle_1.0.5                
## [45] htmlwidgets_1.6.4               shinyjs_2.1.1                  
## [47] pkgconfig_2.0.3                 RcppParallel_5.1.11-2          
## [49] pillar_1.11.1                   bslib_0.10.0                   
## [51] later_1.4.8                     gtable_0.3.6                   
## [53] glue_1.8.1                      data.table_1.18.2.1            
## [55] Rcpp_1.1.1-1.1                  xfun_0.57                      
## [57] tibble_3.3.1                    tidyselect_1.2.1               
## [59] knitr_1.51                      farver_2.1.2                   
## [61] xtable_1.8-8                    htmltools_0.5.9                
## [63] rmarkdown_2.31                  compiler_4.6.0                 
## [65] S7_0.2.2