In this document we introduce the functionality avalaiable in stops
for fitting multidimensional scaling (MDS; Borg & Groenen 2005) or proximity scaling (PS) models either with a STOPS or COPS idea or not. We start with a short introduction to PS and the models that we have available. We then explain fitting of these models with the stops
package. Next, we introduce the reader to COPS (Rusch et al. 2015a) and STOPS (Rusch et al. 2015b) models and show how to fit those. For illustration we use the smacof::kinshipdelta
data set (Rosenberg, S. & Kim, M. P., 1975) which lists percentages of how often 15 kinship terms were not grouped together by college students.
library(stops)
## Loading required package: smacof
## Loading required package: rgl
##
## Attaching package: 'stops'
##
## The following object is masked from 'package:stats':
##
## cmdscale
For proximity scaling (PS) or multidimensional scaling (MDS) the input is typically an \(N\times N\) matrix \(\Delta^*=f(\Delta)\), a matrix of proximities with elements \(\delta^*_{ij}\), that is a function of a matrix of observed non-negative dissimilarities \(\Delta\) with elements \(\delta_{ij}\). \(\Delta^*\) usually is symmetric (but does not need to be). The main diagonal of \(\Delta\) is 0. We call a \(f: \delta_{ij} \mapsto \delta^*_{ij}\) a proximity transformation function. In the MDS literature these \(\delta_{ij}^*\) are often called dhats or disparities. The problem that proximity scaling solves is to locate an \(N \times M\) matrix \(X\) (the configuration) with row vectors \(x_i, i=1,\ldots,N\) in low-dimensional space \((\mathbb{R}^M, M \leq N)\) in such a way that transformations \(g(d_{ij}(X))\) of the fitted distances \(d_{ij}(X)=d(x_i,x_j)\)—i.e., the distance between different \(x_i, x_j\)—approximate the \(\delta^*_{ij}\) as closely as possible. We call a \(g: d_{ij}(X) \mapsto d_{ij}^*(X)\) a distance transformation function. In other words, proximity scaling means finding \(X\) so that \(d^*_{ij}(X)=g(d_{ij}(X))\approx\delta^*_{ij}=f(\delta_{ij})\).
This approximation \(D^*(X)\) to the matrix \(\Delta^*\) is found by defining a fit criterion (the loss function), \(\sigma_{MDS}(X)=L(\Delta^*,D^*(X))\), that is used to measure how closely \(D^*(X)\) approximates \(\Delta^*\). Usually, they are closely related to the quadratic loss function. A general formulation of a loss function based on a quadratic loss is \begin{equation} \label{eq:stress} \sigma_{MDS}(X)=\sum^N_{i=1}\sum^N_{j=1} z_{ij} w_{ij}\left[d^*_{ij}(X)-\delta^*_{ij}\right]^2=\sum^N_{i=1}\sum^N_{j=1} z_{ij}w_{ij}\left[g\left(d_{ij}(X)\right)-f(\delta_{ij})\right]^2 \end{equation}Here, the \(w_{ij}\) and \(z_{ij}\) are finite weights, with \(z_{ij}=0\) if the entry is missing and \(z_{ij}=1\) otherwise.
The loss function used is then minimized to find the vectors \(x_1,\dots,x_N\), i.e., \begin{equation} \label{eq:optim} \arg \min_{X}\ \sigma_{MDS}(X). \end{equation}There are a number of optimization techniques one can use to solve this optimization problem.
stops
is based on the loss function type stress (Kruskall 1964). This uses some type of Minkowski distance (\(p > 0\)) as the distance fitted to the points in the configuration,
\begin{equation}
\label{eq:dist}
d_{ij}(X) = ||x_{i}-x_{j}||_p=\left( \sum_{m=1}^M |x_{im}-x_{jm}|^p \right)^{1/p} \ i,j = 1, \dots, N.
\end{equation}
Typically, the norm used is the Euclidean norm, so \(p=2\). In standard MDS \(g(\cdot)=f(\cdot)=I(\cdot)\), the identity function.
This formulation enables one to express a large number of PS methods many of which are implemented in stops
. In stops
we allow to use specific choices for \(f(\cdot)\) and \(g(\cdot)\) from the family of power transformations so one can fit the following stress models:
For all of these models one can use the function powerStressMin
which uses majorization to find the solution (de Leeuw, 2014) . The function allows to specify a kappa
, lambda
and nu
argument as well as a weightmat
(the \(w_{ij}\)).
The object returned from powerStressMin
is of class smacofP
which extends the smacof
classes (de Leeuw & Mair, 2009) to allow for the power transformations. Apart from that the objects are made so that they have maximum compatibility to methods from smacof
. Accordingly, the following S3 methods are available:
Method | Description |
---|---|
Prints the object | |
summary | A summary of the object |
plot | 2D Plots of the object |
plot3d | Dynamic 3D configuration plot |
plot3dstatic | Static 3D configuration plot |
residuals | Residuals |
coef | Model Coefficients |
Let us illustrate the usage
dis<-as.matrix(smacof::kinshipdelta)
stress
)res1<-powerStressMin(dis,kappa=1,lambda=1)
res1
##
## Call: powerStressMin(delta = dis, kappa = 1, lambda = 1)
##
## Model: Power Stress SMACOF
## Number of objects: 15
## Stress-1 value: 0.2667
## Number of iterations: 5219
sammon
mappingres2<-powerStressMin(dis,kappa=1,lambda=1,nu=-1,weightmat=dis)
res2
##
## Call: powerStressMin(delta = dis, kappa = 1, lambda = 1, nu = -1, weightmat = dis)
##
## Model: Power Stress SMACOF
## Number of objects: 15
## Stress-1 value: 7.023
## Number of iterations: 74860
Alternatively, one can use the faster sammon
function from MASS
(Venables & Ripley, 2002) for which we provide a wrapper that adds class attributes and methods (and overloads the function).
res2a<-sammon(dis)
## Initial stress : 0.17053
## stress after 3 iters: 0.10649
res2a
##
## Call: sammon(d = dis)
##
## Model: Sammon Scaling
## Number of objects: 15
## Stress: 0.1065
elastic
scalingres3<-powerStressMin(dis,kappa=1,lambda=1,nu=-2,weightmat=dis)
res3
##
## Call: powerStressMin(delta = dis, kappa = 1, lambda = 1, nu = -2, weightmat = dis)
##
## Model: Power Stress SMACOF
## Number of objects: 15
## Stress-1 value: 59.87
## Number of iterations: 1e+05
sstress
modelres4<-powerStressMin(dis,kappa=2,lambda=2)
res4
##
## Call: powerStressMin(delta = dis, kappa = 2, lambda = 2)
##
## Model: Power Stress SMACOF
## Number of objects: 15
## Stress-1 value: 0.3516
## Number of iterations: 39427
rstress
model (with \(r=1\) as \(r=\kappa/2\))res5<-powerStressMin(dis,kappa=2,lambda=1)
res5
##
## Call: powerStressMin(delta = dis, kappa = 2, lambda = 1)
##
## Model: Power Stress SMACOF
## Number of objects: 15
## Stress-1 value: 0.4133
## Number of iterations: 17686
powermds
modelres6<-powerStressMin(dis,kappa=2,lambda=1.5)
res6
##
## Call: powerStressMin(delta = dis, kappa = 2, lambda = 1.5)
##
## Model: Power Stress SMACOF
## Number of objects: 15
## Stress-1 value: 0.3733
## Number of iterations: 39377
powersammon
modelres7<-powerStressMin(dis,kappa=2,lambda=1.5,nu=-1,weightmat=dis)
res7
##
## Call: powerStressMin(delta = dis, kappa = 2, lambda = 1.5, nu = -1,
## weightmat = dis)
##
## Model: Power Stress SMACOF
## Number of objects: 15
## Stress-1 value: 7.271
## Number of iterations: 1e+05
powerelastic
scalingres8<-powerStressMin(dis,kappa=2,lambda=1.5,nu=-2,weightmat=dis)
res8
##
## Call: powerStressMin(delta = dis, kappa = 2, lambda = 1.5, nu = -2,
## weightmat = dis)
##
## Model: Power Stress SMACOF
## Number of objects: 15
## Stress-1 value: 62.61
## Number of iterations: 1e+05
powerstress
modelres9<-powerStressMin(dis,kappa=2,lambda=1.5,nu=-1.5,weightmat=2*1-diag(nrow(dis)))
res9
##
## Call: powerStressMin(delta = dis, kappa = 2, lambda = 1.5, nu = -1.5,
## weightmat = 2 * 1 - diag(nrow(dis)))
##
## Model: Power Stress SMACOF
## Number of objects: 15
## Stress-1 value: 0.8362
## Number of iterations: 44052
summary(res9)
##
## Configurations:
## D1 D2
## Aunt -0.1228 0.2497
## Brother 0.1961 -0.1405
## Cousin 0.0526 0.3101
## Daughter -0.2048 -0.1259
## Father 0.1637 -0.1824
## Granddaughter -0.2358 -0.0525
## Grandfather 0.2149 -0.1329
## Grandmother -0.2362 -0.0861
## Grandson 0.2148 -0.1073
## Mother -0.2097 -0.1366
## Nephew 0.1709 0.2108
## Niece -0.1234 0.2441
## Sister -0.2216 -0.0973
## Son 0.1700 -0.1710
## Uncle 0.1713 0.2179
##
##
## Stress per point:
## SPP SPP(%)
## Niece 0.0022 4.688
## Nephew 0.0022 4.712
## Aunt 0.0023 4.937
## Uncle 0.0023 5.028
## Daughter 0.0028 6.084
## Son 0.0029 6.275
## Father 0.0030 6.452
## Mother 0.0031 6.686
## Cousin 0.0032 6.798
## Sister 0.0034 7.272
## Brother 0.0034 7.283
## Grandson 0.0037 7.979
## Granddaughter 0.0038 8.102
## Grandmother 0.0041 8.819
## Grandfather 0.0041 8.885
plot(res9)
plot(res9,"transplot")
plot(res9,"Shepard")
plot(res9,"resplot")
plot(res9,"bubbleplot")
stops
is based on the loss function type . Here the \(\Delta^*\) are a transformation of the \(\Delta\), \(\Delta^*= f (\Delta)\) so that \(f(\cdot)=-(h\circ l)(\cdot)\) where \(l\) is any function and \(h(\cdot)\) is a double centering operation, \(h(\Delta)=\Delta-\Delta_{i.}-\Delta_{.j}+\Delta_{..}\) where \(\Delta_{i.}, \Delta_{.j}, \Delta_{..}\) are matrices consisting of the row, column and grand marginal means respectively. These then get approximated by (functions of) the inner product matrices of \(X\)
\begin{equation}
\label{eq:dist2}
d_{ij}(X) = \langle x_{i},x_{j} \rangle
\end{equation}
We can thus express classical scaling as a special case of the general PS loss with \(d_{ij}(X)\) as an inner product, \(g(\cdot) = I(\cdot)\) and \(f(\cdot)=-(h \circ I)(\cdot)\).
If we again allow power transformations for \(g(\cdot)\) and \(f(\cdot)\) one can fit the following strain models with stops
In stops
we have a wrapper to cmdscale
(overloading the base
function) which extend functionality by offering an object that matches smacofP
objects with corresponding methods.
A powerstrain
model is rather easy to fit with simply subjecting the dissimilarity matrix to some power. Here we use \(\lambda=3\).
resc<-cmdscale(kinshipdelta^3)
resc
##
## Call: cmdscale(d = kinshipdelta^3)
##
## Model: Torgerson-Gower Scaling
## Number of objects: 15
## GOF: 0.4258 0.6282
summary(resc)
##
## Configurations:
## D1 D2
## Aunt 178193 204987
## Brother -174369 -94357
## Cousin -48355 265057
## Daughter 169149 -109936
## Father -145389 -168604
## Granddaughter 187039 -44851
## Grandfather -180116 -103668
## Grandmother 199145 -83039
## Grandson -169768 -72359
## Mother 185798 -138677
## Nephew -195964 173124
## Niece 168278 208870
## Sister 182224 -70764
## Son -149876 -135883
## Uncle -205989 170101
summary(resc)
plot(resc)
The main contribution of the stops
package is not in solely fitting the powerstress or powerstrain models and their variants from above, but allowing to choose the right transformation to achieve a “structured” MDS result automatically. This can be useful in a variety of contextes: to explore or generate structures, to restrict the target space, to avoid artefacts, to preserve certain types of structures and so forth.
For this, an MDS loss function is subjected to nonlinear transformations and is augmented to include penalties for the type of structures one is aiming for. This combination of an MDS loss with a structuredness penalty is what we call “structure optimized loss” (stoploss) and the resulting MDS is coined “Structured Optimized Proximity Scaling” (or STOPS). The prime example for a STOPS model is “Cluster Optimized Proximity Scaling” (COPS) which selects optimal parameters so that the clusteredness appearance of the configuation is improved (see below).
Following Rusch et al. (2015b) the general idea is that from given observations \(\Delta\) we look for a configuration \(X\). This achieves this by minimizing some loss function \(\sigma_{MDS}(X^*;\Delta^*)\) where the \(\Delta^*, X^*\) are functions of the \(\Delta\) and \(X\). The \(X\) has properties with regards to its structural appearance, which we call c-structuredness for configuration-structuredness. There are different types of c-structuredness people might be interested in (say, how clustered the result is, or that dimensions are orthogonal or if there is some submanifold that the data live on). We developed indices for these types of c-structuredness that capture that essence in the configuration.
We have as part of a STOPS model a proximity scaling loss function \(\sigma_{MDS}(\cdot)\), a \(\Delta\) and an \(X\) and some transformation \(f_{ij}(\delta_{ij};\theta)\) and \(g_{ij}(d_{ij};\theta)\) that is parametrized (with \(\theta\) either finite or infinite dimensional, e.g., a transformation applied to all observations like a power transformation or even an individual transformation per object). These transformations achieve a sort of push towards more structure, so different values for \(\theta\) will in general lead to different c-structuredness.
We further have \(K\) different indices \(I_k(X)\) that measure different types of c-structuredness. We can then define as methods that are of the form (additive STOPS) \begin{equation} \text{aSTOPS}(X, \theta, v_0, \dots, v_k; \Delta) = v_0 \cdot \sigma_{MDS}(X^*(\theta)) + \sum^K_{k=1} v_k I_k(X(\theta)) \end{equation} or (multiplicative STOPS) \begin{equation} \text{mSTOPS}(X, \theta, v_0, \dots, v_k; \Delta) = \sigma_{MDS}(X^*(\theta))^{v_0} \cdot \prod^K_{k=1} I_k(X(\theta))^{v_k} \end{equation}(which can be expressed as aSTOPS by logarithms). Here the \(v_0,...,v_k\) are weights that determine how the individual parts (mds loss and c-structuredness indices) are aggregated.
The job is then to find \begin{equation} \arg\min_{\vartheta}\ \text{aSTOPS}(X, \theta, v_0, \dots, v_k; \Delta)\ \text{or} \ \arg\min_{\vartheta}\ \text{mSTOPS}(X, \theta, v_0, \dots, v_k; \Delta) \end{equation}where \(\vartheta \subseteq \{X,\theta, v_0, \dots, v_k\}\). Typically \(\vartheta\) will be a subset of all possible parameters here (e.g., the weights might be given a priori). Currently, the transformations that can be used in stops
are limited to power transformations.
Minimizing stoploss can be difficult. In stops
we use a nested algorithm combining optimization that internally first solves for \(X\) given \(\theta\), \(\arg\min_X \sigma_{MDS}\left(X,\theta\right)\), and then optimize over \(\theta\) with a metaheuristic. Implemented are a simulated annealing (optimmethod="SANN"
) or particle swarm optimization (optimmethod="pso"
) and a variant of the Luus-Jaakola (optimmethod="ALJ"
) procedure . We suggest to use the latter. A Bayesian optimization approach is currently under way.
Currently the following c-structuredness types are supported:
cclusteredness
): A clustered appearance of the configuration (\(I_k\) is the normed OPTICS cordillera (COPS; Rusch et al. 2015a); 0 means no c-clusteredness, 1 means perfect c-clusteredness)clinearity
): Projections lie close to a linear subspace of the configuration (\(I_k\) is maximal multiple correlation; 0 means orthogonal, 1 means perfectly linear)cmanifoldness
): Projections lie on a sub manifold of the configuration (\(I_k\) is maximal correlation (Sarmanov, 1958); only available for two dimensions; 1 means perfectly smooth function)cdependence
): Random vectors of projections onto the axes are stochastically dependent (\(I_k\) is distance correlation (Szekely et al., 2007); only available for two dimensions; 0 means they are stochastically independent)cassociation
): Pairwise nonlinear association between dimensions (\(I_k\) is the pairwise maximal maximum information coefficient (Reshef et al. 2011), 1 means perfect functional association)cnonmonotonicity
): Deviation from monotonicity (\(I_k\) is the pairwise maximal maximum assymmetry score (Reshef et al. 2011), the higher the less monotone)cfunctionality
): Pairwise functional, smooth, noise-free relationship between dimensions (\(I_k\) is the mean pairwise maximum edge value (Reshef et al. 2011), 1 means perfect functional association)ccomplexity
): Measures the degree of complexity of the functional relationship between any two dimensions (\(I_k\) is the pairwise maximal minimum cell number (Reshef et al. 2011), the higher the more complex)cfaithfulness
): How accurate is the neighbourhood of \(\Delta\) preserved in \(D\) (\(I_k\) is the adjusted Md index of Chen & Buja, 2013; note that this index deviates from the others by being a function of both \(X^*\) and \(\Delta^*\) rather than \(X^*\) alone)If we have a single \(I(X)=OC(X)\), the OPTICS cordillera (Rusch et al. 2015a), and the transformations applied are power transformations and the weights for the \(I(X)\) is negative we essentially have COPS (see below).
For the MDS loss (argument loss
in functions stops
and cops
), the functions currently support all losses derived from powerstress and powerstrain and can in principle be fitted with powerStressMin
alone. However, for many models offer dedicated functions that either use workhorses that are more optimized for the problem at hand and/or restrict the parameter space for the distance/proximity transformations and thus can be faster. They are:
stress
, smacofSym
: Kruskall’s stress; Workhorse: smacofSym
, Optimization over \(\lambda\)smacofSphere
: Kruskall’s stress for projection onto a sphere; Workhorse smacofSphere
, Optimizes over \(\lambda\)strain
, powerstrain
: Classical scaling; Workhorse: cmdscale
, Optimization over \(\lambda\)sammon
, sammon2
: Sammon scaling; Workhorse: sammon
or smacofSym
, Optimization over \(\lambda\)elastic
: Elastic scaling; Workhorse: smacofSym
, Optimization over \(\lambda\)sstress
: S-stress; Workhorse: powerStressMin
, Optimization over \(\lambda\)rstress
: S-stress; Workhorse: powerStressMin
, Optimization over \(\kappa\)powermds
: MDS with powers; Workhorse: powerStressMin
, Optimization over \(\kappa\), \(\lambda\)powersammon
: Sammon scaling with powers; Workhorse: powerStressMin
, Optimization over \(\kappa\), \(\lambda\)powerelastic
: Elastic scaling with powers; Workhorse: powerStressMin
, Optimization over \(\kappa\), \(\lambda\)powerstress
: Power stress model; Workhorse: powerStressMin
, Optimization over \(\kappa\), \(\lambda\), \(\nu\)The syntax for fitting a stops
model is rather straightforward. One has to supply the arguments dis
which is a dissimilarity matrix and structures
a character vector listing the c-structuredness type that should be used to augment the PS loss (see above for the types of structures and losses). The parameters for the structuredness indices should be given with strucpars
, a list whose elements are lists corresponding to each structuredness index and listing the parameters (if the default should be used the list element should be set to NULL
). The PS loss can be chosen with the argument loss
. The type of aggregation for the multi-objective optimization is specified in type
and can be one of additive
or multiplicative
. One can pass additional parameters to the fitting workhorses with ...
.
stops(dis, structures = c("cclusteredness","clinearity"), loss="stress", ...)
One then has all the S3 methods of smacofP
at one’s disposal.
For example, let us fit an mSTOPS model that looks for a transformation of the \(\delta_{ij}\) so that a) the result has maximal c-clusteredness (which is 1 in the best case, so we set a negative weight for this structure) b) the projection onto the principal axes are nearly orthogonal (c-clinearity close to 0, so we set a positive weight for this structure) c) but the projections onto the principal axes should be stochastically dependent (negative weight on c-dependence) and d) the fit of the MDS is also factored in (so we set positive weight on the MDS loss). Since we use mSTOPS and a negative weight for c-linearity and c-dependence, a c-linearity/c-dependence close to 0 will overall dominate the stoploss function with the other two criteria being more of an afterthought - in aSTOPS that would be different.
!!: This is generally the approach to be chosen: We minimize the stoploss, so a c-structuredness index that should be (numerically) large needs a negative weight and a c-structuredness index that should be (numerically) small needs a positive weight.
We first set up the parameters for the structuredness indices. For the OPTICS cordillera we use a \(d_{max}\) of 1, epsilon=10 and minpts=2, for c-linearity we have no parameters (so using NULL
will work) and for the c-dependence we have a single parameter, index
, which we set to 2.
strucpars<-list(list(epsilon=10,minpts=2,rang=c(0,1.3)), #cordillera
NULL, # c-linearity (has no parameters)
list(index=2) #c-dependence
)
ressm<-stops(kinshipdelta,loss="stress",stressweight=1,structures=c("cclusteredness","clinearity","cdependence"),strucweight=c(-0.33,0.33,-0.33),verbose=0,strucpars=strucpars,type="multiplicative")
ressm
##
## Call: stops(dis = kinshipdelta, loss = "stress", structures = c("cclusteredness",
## "clinearity", "cdependence"), stressweight = 1, strucweight = c(-0.33,
## 0.33, -0.33), strucpars = strucpars, verbose = 0, type = "multiplicative")
##
## Model: multiplicative STOPS with stress loss function and theta parameters= 1 3.095 1
##
## Number of objects: 15
## MDS loss value: 1
## C-Structuredness Indices: cclusteredness 0.53027 clinearity 0.01802 cdependence 0.01802
## Structure optimized loss (stoploss): 1.233
## MDS loss weight: 1 c-structuredness weights: -0.33 0.33 -0.33
## Number of iterations of ALJ optimization: 117
plot(ressm)
Let us compare this with the corresponding aSTOPS
ressa<-stops(kinshipdelta,loss="stress",stressweight=1,structures=c("cclusteredness","clinearity","cdependence"),strucweight=c(-0.33,0.33,-0.33),verbose=0,strucpars=strucpars,type="additive")
ressa
##
## Call: stops(dis = kinshipdelta, loss = "stress", structures = c("cclusteredness",
## "clinearity", "cdependence"), stressweight = 1, strucweight = c(-0.33,
## 0.33, -0.33), strucpars = strucpars, verbose = 0, type = "additive")
##
## Model: additive STOPS with stress loss function and theta parameters= 1 3.114 1
##
## Number of objects: 15
## MDS loss value: 1
## C-Structuredness Indices: cclusteredness 0.53021 clinearity 0.01785 cdependence 0.01785
## Structure optimized loss (stoploss): 0.825
## MDS loss weight: 1 c-structuredness weights: -0.33 0.33 -0.33
## Number of iterations of ALJ optimization: 81
plot(ressa)
We see that the c-clusteredness is higher as compared to the mSTOPS result - we have a number of distinct object clusters (with at least minpts=2) and they are more spread out and distributed more evenly. The dimensions on the other hand are now farther from being orthogonal but the stochastic dependence is higher (which is a non-linear one obviously).
When choosing a c-structuredness index, one needs to be clear what structure one might be interested in and how it interacts with the PS loss chosen. Consider the following example: We fit a powermds
model to the kinship data and want to maximize c-association (i.e., any non-linear relationship) and c-manifoldness but minimize the c-linearity. In other words we try to find a power transformation of \(\Delta\) and \(D\) so that the objects are positioned in the configuration in such a way that the projection onto the principal axes are as close as possible to being related by a smooth but non-linear function.
resa<-stops(kinshipdelta,structures=c("cassociation","cmanifoldness","clinearity"),loss="powermds",verbose=0,strucpars=list(NULL,NULL,NULL),type="additive",strucweight=c(-0.5,-0.5,0.5))
resa
##
## Call: stops(dis = kinshipdelta, loss = "powermds", theta = c(2.9429394,
## 1.67850653, 1.57140404), structures = c("cassociation", "cmanifoldness",
## "clinearity"), strucweight = c(-0.5, -0.5, 0.5), strucpars = list(NULL,
## NULL, NULL), verbose = 0, type = "additive", itmax = 1)
##
## Model: additive STOPS with powermds loss function and theta parameters= 2.943 1.679 1
##
## Number of objects: 15
## MDS loss value: 0.9995
## C-Structuredness Indices: cassociation 1.0000000 cmanifoldness 0.9918892 clinearity 0.0001654
## Structure optimized loss (stoploss): 0.003602
## MDS loss weight: 1 c-structuredness weights: -0.5 -0.5 0.5
## Number of iterations of ALJ optimization: 1
We see in this model (resa
) that indeed the c-association is 1, which says we have a near perfect non-linear relationship. How does this relationship look like?
plot(resa)
It is a parabolic shape, so the projections are so that the points on D2 are a near parabolic function of the D1 (projecting onto some structure resembling a conic section is often the case for r-stress which is essentially what we got here - setting a negative weight on c-assocation can combat that if that is an artefact). What we can also see is that there are three clear clusters, so c-clusteredness should be high. But when looking at the OPTICS cordillera here, we find that the OPTICS cordillera is lower than for the result from above with using stress
and lambda=2.66 (the ressa
model).
c1<-cordillera(resa$fit$conf,minpts=2,epsilon=10,rang=c(0,1.3))
c2<-cordillera(ressa$fit$conf,minpts=2,epsilon=10,rang=c(0,1.3))
c1
## raw normed
## 7.9441 0.4365
c2
## raw normed
## 9.6499 0.5302
This discrepancy comes from the definition of c-clusteredness (Rusch et al, 2015a) where more clusters, more spread-out clusters, more evenly distributed clusters and denser clusters all increase c-clusteredness. In the example with maximizing c-association we have two very dense clusters of 5 points and 1 relatively non-dense cluster of five other points. In the model maximizing c-clusteredness (and others) we get 6 relatively moderate dense clusters with 2 or 3 points each, which is also the minimum number of points we wanted to be grouped together. Most importantly, they are projected onto a much larger range of the target space as the \(X\) obtained from the stress
loss is different than the one obtained from the powermds
loss, so the \(d_{max}\) is very different. Since we use the normed OPTICS cordillera there, we look at c-clusteredness relative to the most clustered appearance with two points per cluster. Thus, the second result has more c-clusteredness. If we would define a cluster as having at most 5 points then c-clusteredness of the result with high c-association also has a large c-clusteredness because then the clusters found match the definition of high c-clusteredness.
c3<-cordillera(resa$fit$conf,minpts=5,epsilon=10,rang=c(0,1.3))
c3
## raw normed
## 3.8650 0.5946
Note that it may just as well be possible to have a high c-association and no c-clusteredness at all (e.g., points lying equidistant on a smooth non-linear curve). Note also that the models are not necessarily comparable due to different stress functions - the transformation in powermds
that is optimal with respect to c-clusteredness would be different.
Indeed one can optimize for c-clusteredness alone and using it as a “goodness-of-clusteredness” index (i.e., the \(d_{max}\) is not constant over configurations but varies conditional on the configuration) then we get a projection with c-clusteredness of 0.67.
resa2<-stops(kinshipdelta,structures=c("cclusteredness"),loss="powermds",verbose=0,strucpars=list(list(epsilon=10,rang=NULL,minpts=2)),type="additive",strucweight=-1,stressweight=0)
For convenience it is also possible to use the stops
function for finding the loss-optimal transformation in the the non-augmented models specified in loss
, by setting the strucweight
, the weight of any c-structuredness, to 0. Then the function optimizes the MDS loss function only.
ressa<-stops(kinshipdelta,structure=c("clinearity"),strucweight=0,loss="stress",verbose=0)
with \(\theta_0=(1,1)^\top\) in case of loss functions with power transformations. Thus an increase of 1 in the MDS loss measure can be compensated by an increase of \(v^0_1/v^0_2\) in c-clusteredness. Selecting \(v_1=1,v_2=v^{0}_2\) this way is in line with the idea of pushing the configurations towards a more clustered appearance relative to the initial solution.
Another possibility is to choose them in such a way that \(\text{coploss}=0\) in the optimum value, i.e., choosing \(v^{opt}_{1}, v^{opt}_2\) so that \begin{equation} v^{opt}_1 \cdot \sigma_{MDS}\left(X(\theta^*),\theta^*\right)-v^{opt}_2 \cdot \text{OC}\left(X(\theta^*);\epsilon,k,q \right) = 0 \end{equation}with \(\theta^*:=\arg\min_\theta \text{coploss}(\theta)\). This is in line with having \(\text{coploss}(\theta)>0\) for \(\theta \neq \theta^*\) and allows to optimize over \(v_1,v_2\).
The optimization problem in COPS is then to findFor a given \(\theta\) if \(v_2=0\) than the result of optimizing the above is the same as solving the respective original PS problem. Letting \(\theta\) be variable, \(v_2=0\) will minimize the loss over configurations obtained from using different \(\theta\).
The c-clusteredness index we use is the OPTICS cordillera which measures how clustered a configuration appears. It is based on the OPTICS algorithm that outputs an ordering together with a distance. The OPTICS cordillera is now simply an agregation of that information. Since we know what constitutes a maximally clustered result, we can derive an upper bound and normalize the index to lie between 0 and 1. If it is maximally clustered, the index gets a value of 1,and it gets 0 if all points are equidistant to their nearest neighbours (a matchstick embedding). See Rusch et al (2015a) for details.
Even though one can fit a COPS model with stops
, there is a dedicated function cops
. Its syntax works pretty much like in stops
only that the structure
argument is non-existant.
cops(dis,loss,...)
For the example we use a COPS model for a classical scaling (strain
loss)
resc<-cops(kinshipdelta,loss="strain")
resc
##
## Call: cops(dis = kinshipdelta, loss = "strain")
##
## Model: COPS with strain loss function and parameters kappa= 1 lambda= 1.606 nu= 1
##
## Number of objects: 15
## MDS loss value: 0.3237
## OPTICS cordillera: Raw 10.87 Normed 0.3087
## Cluster optimized loss (coploss): -0.03458
## MDS loss weight: 1 OPTICS cordillera weight: 1.16
## Number of iterations of ALJ optimization: 90
summary(resc)
##
## Configurations:
## D1 D2
## Aunt -345.9 492.49
## Brother 365.5 -253.89
## Cousin 107.3 560.94
## Daughter -394.8 -262.97
## Father 320.3 -371.50
## Granddaughter -411.9 -99.39
## Grandfather 370.3 -214.95
## Grandmother -412.2 -148.94
## Grandson 370.4 -181.08
## Mother -411.9 -289.88
## Nephew 424.9 398.15
## Niece -340.0 485.92
## Sister -402.9 -183.98
## Son 328.5 -338.10
## Uncle 432.4 407.18
A number of plots are availabe
plot(resc,"confplot")
plot(resc,"Shepard")
plot(resc,"transplot")
plot(resc,"reachplot")
For convenience it is also possible to use the cops
function for finding the loss-optimal transformation in the the non-augmented models specified in loss
, by setting the cordweight
, the weight of the OPTICS cordillera, to 0. Then the function optimizes the MDS loss function only.
resca<-cops(kinshipdelta,cordweight=0,loss="strain")
resca
##
## Call: cops(dis = kinshipdelta, loss = "strain", cordweight = 0)
##
## Model: COPS with strain loss function and parameters kappa= 1 lambda= 1.586 nu= 1
##
## Number of objects: 15
## MDS loss value: 0.3237
## OPTICS cordillera: Raw 10.87 Normed 0.3087
## Cluster optimized loss (coploss): 0.3237
## MDS loss weight: 1 OPTICS cordillera weight: 0
## Number of iterations of ALJ optimization: 48
Here the results match the result from using the standard cordweight
. We can give more weight to the c-clusteredness though:
rescb<-cops(kinshipdelta,cordweight=20,loss="strain")
rescb
##
## Call: cops(dis = kinshipdelta, loss = "strain", cordweight = 20)
##
## Model: COPS with strain loss function and parameters kappa= 1 lambda= 2.06 nu= 1
##
## Number of objects: 15
## MDS loss value: 0.3395
## OPTICS cordillera: Raw 11.01 Normed 0.3128
## Cluster optimized loss (coploss): -5.916
## MDS loss weight: 1 OPTICS cordillera weight: 20
## Number of iterations of ALJ optimization: 94
plot(resca,main="with cordweight=0")
plot(rescb,main="with cordweight=20")
This result has more c-clusteredness but less fit. The higher c-clusteredness is discernable in the Grandfather/Brother and Grandmother/Sister clusters (we used a minimum number of 2 observations to make up a cluster, minpts=2
).
The package also provides functions that are used by the cops
and stops
and powerStressMin
functions but may be of interest to a end user beyond that.
For calculating a COPS solution, we need the OPTICS algorithm and the OPTICS cordillera. In the package we also provide a rudimentary interface to the OPTICS impementation in ELKI.
data(iris)
res<-optics(iris[,1:4],minpts=2,epsilon=1000)
print(res)
## observation reachability
## 1 ID=1 reachability=∞
## 2 ID=18 reachability=0.1
## 3 ID=41 reachability=0.14142136
## 4 ID=5 reachability=0.14142136
## 5 ID=38 reachability=0.14142136
## 6 ID=40 reachability=0.14142136
## 7 ID=8 reachability=0.1
## 8 ID=50 reachability=0.14142136
## 9 ID=29 reachability=0.14142136
## 10 ID=28 reachability=0.14142136
## 11 ID=36 reachability=0.2236068
## 12 ID=49 reachability=0.2236068
## 13 ID=11 reachability=0.1
## 14 ID=27 reachability=0.2236068
## 15 ID=24 reachability=0.2
## 16 ID=44 reachability=0.2236068
## 17 ID=12 reachability=0.2236068
## 18 ID=30 reachability=0.2236068
## 19 ID=31 reachability=0.14142136
## 20 ID=35 reachability=0.14142136
## 21 ID=10 reachability=0.1
## 22 ID=2 reachability=0.14142136
## 23 ID=46 reachability=0.14142136
## 24 ID=13 reachability=0.14142136
## 25 ID=26 reachability=0.17320508
## 26 ID=4 reachability=0.17320508
## 27 ID=48 reachability=0.14142136
## 28 ID=3 reachability=0.14142136
## 29 ID=43 reachability=0.2236068
## 30 ID=39 reachability=0.2
## 31 ID=9 reachability=0.14142136
## 32 ID=7 reachability=0.2236068
## 33 ID=20 reachability=0.24494897
## 34 ID=22 reachability=0.14142136
## 35 ID=47 reachability=0.14142136
## 36 ID=14 reachability=0.24494897
## 37 ID=25 reachability=0.3
## 38 ID=37 reachability=0.3
## 39 ID=21 reachability=0.3
## 40 ID=32 reachability=0.28284271
## 41 ID=17 reachability=0.34641016
## 42 ID=6 reachability=0.34641016
## 43 ID=19 reachability=0.33166248
## 44 ID=33 reachability=0.34641016
## 45 ID=34 reachability=0.34641016
## 46 ID=45 reachability=0.36055513
## 47 ID=16 reachability=0.36055513
## 48 ID=15 reachability=0.41231056
## 49 ID=23 reachability=0.45825757
## 50 ID=42 reachability=0.6244998
## 51 ID=99 reachability=1.64012195
## 52 ID=58 reachability=0.38729833
## 53 ID=94 reachability=0.14142136
## 54 ID=61 reachability=0.36055513
## 55 ID=82 reachability=0.64807407
## 56 ID=81 reachability=0.14142136
## 57 ID=70 reachability=0.17320508
## 58 ID=90 reachability=0.24494897
## 59 ID=54 reachability=0.2
## 60 ID=93 reachability=0.26457513
## 61 ID=83 reachability=0.14142136
## 62 ID=68 reachability=0.24494897
## 63 ID=100 reachability=0.26457513
## 64 ID=97 reachability=0.14142136
## 65 ID=96 reachability=0.14142136
## 66 ID=95 reachability=0.17320508
## 67 ID=89 reachability=0.17320508
## 68 ID=91 reachability=0.26457513
## 69 ID=62 reachability=0.3
## 70 ID=56 reachability=0.31622777
## 71 ID=67 reachability=0.3
## 72 ID=85 reachability=0.2
## 73 ID=79 reachability=0.33166248
## 74 ID=92 reachability=0.2
## 75 ID=64 reachability=0.14142136
## 76 ID=74 reachability=0.2236068
## 77 ID=72 reachability=0.34641016
## 78 ID=98 reachability=0.33166248
## 79 ID=75 reachability=0.2
## 80 ID=76 reachability=0.26457513
## 81 ID=66 reachability=0.14142136
## 82 ID=59 reachability=0.24494897
## 83 ID=55 reachability=0.24494897
## 84 ID=52 reachability=0.31622777
## 85 ID=57 reachability=0.26457513
## 86 ID=87 reachability=0.31622777
## 87 ID=53 reachability=0.28284271
## 88 ID=51 reachability=0.26457513
## 89 ID=78 reachability=0.31622777
## 90 ID=77 reachability=0.31622777
## 91 ID=80 reachability=0.34641016
## 92 ID=86 reachability=0.37416574
## 93 ID=60 reachability=0.38729833
## 94 ID=148 reachability=0.41231056
## 95 ID=111 reachability=0.2236068
## 96 ID=112 reachability=0.34641016
## 97 ID=117 reachability=0.36055513
## 98 ID=138 reachability=0.14142136
## 99 ID=104 reachability=0.24494897
## 100 ID=129 reachability=0.33166248
## 101 ID=133 reachability=0.1
## 102 ID=105 reachability=0.3
## 103 ID=146 reachability=0.36055513
## 104 ID=142 reachability=0.24494897
## 105 ID=141 reachability=0.36055513
## 106 ID=145 reachability=0.24494897
## 107 ID=121 reachability=0.26457513
## 108 ID=144 reachability=0.2236068
## 109 ID=125 reachability=0.3
## 110 ID=113 reachability=0.34641016
## 111 ID=140 reachability=0.17320508
## 112 ID=116 reachability=0.37416574
## 113 ID=149 reachability=0.3
## 114 ID=137 reachability=0.24494897
## 115 ID=147 reachability=0.37416574
## 116 ID=124 reachability=0.24494897
## 117 ID=127 reachability=0.17320508
## 118 ID=128 reachability=0.24494897
## 119 ID=139 reachability=0.14142136
## 120 ID=71 reachability=0.2236068
## 121 ID=150 reachability=0.28284271
## 122 ID=143 reachability=0.33166248
## 123 ID=102 reachability=0
## 124 ID=114 reachability=0.26457513
## 125 ID=122 reachability=0.31622777
## 126 ID=84 reachability=0.36055513
## 127 ID=134 reachability=0.33166248
## 128 ID=73 reachability=0.36055513
## 129 ID=103 reachability=0.4
## 130 ID=126 reachability=0.38729833
## 131 ID=130 reachability=0.34641016
## 132 ID=65 reachability=0.42426407
## 133 ID=101 reachability=0.42426407
## 134 ID=120 reachability=0.43588989
## 135 ID=108 reachability=0.43588989
## 136 ID=131 reachability=0.26457513
## 137 ID=115 reachability=0.48989795
## 138 ID=63 reachability=0.48989795
## 139 ID=69 reachability=0.50990195
## 140 ID=88 reachability=0.26457513
## 141 ID=106 reachability=0.52915026
## 142 ID=123 reachability=0.26457513
## 143 ID=119 reachability=0.41231056
## 144 ID=136 reachability=0.53851648
## 145 ID=135 reachability=0.53851648
## 146 ID=109 reachability=0.55677644
## 147 ID=110 reachability=0.63245553
## 148 ID=107 reachability=0.73484692
## 149 ID=118 reachability=0.81853528
## 150 ID=132 reachability=0.41231056
summary(res)
##
## An OPTICS results with minpts= 2 and epsilon= 1000
##
## Five Point Summary of the Minimum Reachabilities:
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## 0.000 0.173 0.265 0.292 0.346 1.640 1
##
## Stem and Leaf Display of the Minimum Reachabilities:
##
## The decimal point is 1 digit(s) to the left of the |
##
## 0 | 0
## 1 | 00000444444444444444444444444447777777
## 2 | 00000022222222222244444444444466666666666888
## 3 | 0000000022222233333355555555566666666777999
## 4 | 011112244699
## 5 | 13446
## 6 | 235
## 7 | 3
## 8 | 2
## 9 |
## 10 |
## 11 |
## 12 |
## 13 |
## 14 |
## 15 |
## 16 | 4
plot(res,withlabels=TRUE)
and a function for calculating and displaying the OPTICS cordillera.
cres<-cordillera(iris[,1:4],minpts=2,epsilon=1000,scale=FALSE)
cres
## raw normed
## 14.96797 0.06125
summary(cres)
##
## OPTICS cordillera values with minpts= 2 and epsilon= 1000
##
## Raw OC: 14.97
## Normalization: 244.4
## Normed OC: 0.06125
plot(cres)
Since the inner optimization problem in STOPS models is hard and takes long, Rusch et al. (2015a) developed a metaheuristic for the outer optimization problem that needs typically less calls to the inner minimization than pso
or SANN
, albeit without the guarantees of convergence to a global minimum for non-smooth functions. It is an adaptation of the Luus-Jaakola random search (Luus & Jaakola 1973). It can used with the function ljoptim
which modeled its output after optim
. It needs as arguments x
a starting value, fun
a function to optimize, a lower
and upper
box constraint for the search region. By using the argument adaptive=TRUE
or FALSE
one can switch between our adaptive version and the original LJ algorithm. Accuracy of the optimization can be controlled with the maxit
(maximum number of iterations), accd
(terminates after the length of the search space is below this number ) and acc
arguments (terminates if difference of two subsequent function values are below this value).
We optimize a “Wild Function” with the non-adaptive LJ version (and numerical accuracies of at least 1e-16
for accd
and acc
).
set.seed(210485)
fwild <- function (x) 10*sin(0.3*x)*sin(1.3*x^2) + 0.00001*x^4 + 0.2*x+80
res2<-ljoptim(50, fwild,lower=-50,upper=50,adaptive=FALSE,accd=1e-16,acc=1e-16)
res2
## $par
## [1] -15.82
##
## $value
## [1] 67.47
##
## $counts
## function gradient
## 463 NA
##
## $convergence
## [1] 0
##
## $message
## NULL
plot(fwild, -50, 50, n = 1000, main = "ljoptim() minimising 'wild function'")
points(res2$par,res2$value,col="red",pch=19)
We also provide a procrustes adjustment to make two configurations visually comparable. The function is conf_adjust
and takes two configurations conf1
the reference configuration and conf2
another configuration. It returns the adjusted versions
conf_adjust(conf1,conf2)
Borg I, Groenen PJ (2005). Modern multidimensional scaling: Theory and applications. 2nd edition. Springer, New York
Buja A, Swayne DF, Littman ML, Dean N, Hofmann H, Chen L (2008). Data visualization with multidimensional scaling. Journal of Computational and Graphical Statistics, 17 (2), 444-472.
Chen L, Buja A (2013). Stress functions for nonlinear dimension reduction, proximity analysis, and graph drawing. Journal of Machine Learning Research, 14, 1145-1173.
de Leeuw J (2014). Minimizing r-stress using nested majorization. Technical Report, UCLA, Statistics Preprint Series.
de Leeuw J, Mair P (2009). Multidimensional Scaling Using Majorization: SMACOF in R. Journal of Statistical Software, 31 (3), 1-30.
Kruskal JB (1964). Multidimensional scaling by optimizing goodness of fit to a nonmetric hypothesis. Psychometrika, 29 (1), 1-27.
Luus R, Jaakola T (1973). by direct search and systematic reduction of the size of search region. American Institute of Chemical Engineers Journal (AIChE), 19 (4), 760-766.
McGee VE (1966). The multidimensional analysis of ‘elastic’ distances. British Journal of Mathematical and Statistical Psychology, 19 (2), 181-196.
Reshef D, Reshef Y, Finucane H, Grossman S, McVean G, Turnbaugh P, Lander E, Mitzenmacher M, Sabeti P (2011). Detecting novel associations in large datasets. Science, 334, 6062.
Rosenberg, S. & Kim, M. P. (1975). The method of sorting as a data gathering procedure in multivariate research. Multivariate Behavioral Research, 10, 489-502.
Rusch, T., Mair, P. and Hornik, K. (2015a) COPS: Cluster Optimized Proximity Scaling. Discussion Paper Series / Center for Empirical Research Methods, 2015/1. WU Vienna University of Economics and Business, Vienna.
Rusch, T., Mair, P. and Hornik, K. (2015b). Structuredness Indices and Augmented Nonlinear Dimension Reduction. In preparation.
Sammon JW (1969). A nonlinear mapping for data structure analysis. IEEE Transactions on Computers, 18 (5), 401-409
Sarmanov OV (1958) The maximum correlation coefficient (symmetric case). Dokl. Akad. Nauk SSSR, 120 : 4 (1958), 715 - 718.
Székely, G. J. Rizzo, M. L. and Bakirov, N. K. (2007). Measuring and testing independence by correlation of distances, The Annals of Statistics, 35:6, 2769–2794.
Takane Y, Young F, de Leeuw J (1977). Nonmetric individual differences multidimensional scaling: an alternating least squares method with optimal scaling features. Psychometrika, 42 (1), 7-67.
Torgerson WS (1958). Theory and methods of scaling. Wiley.
Venables WN, Ripley BD (2002). Modern Applied Statistics with S. Fourth edition. Springer, New York.