This vignette shows various regression options with optimal scaling (OS). In Gifi this idea is called morals (Young, Takane, and De Leeuw 1976). We start with simple linear regression, followed by other models of increasing complexity.
Throughout this unit we use a dataset included in the
MPsychoR package on emotion development. Participant’s age
and gender are predictors, and granularity is the response. Granularity
refers to a person’s ability to separate their emotions into specific
types. People with low granularity (here scaled in the positive
direction) struggle to separate their emotions (e.g., reporting that
sadness, anger, fear, and others all just feel “bad”), whereas people
with high granularity are very specific in how they parse their emotions
(e.g., easily distinguishing between nuanced emotions like
disappointment and frustration).
We first focus on simple regression settings with age as
predictor and granularity as response. We start with
emulating standard linear regression. We standardize
the variables such that we get standardized regression coefficients
right away.
library("Gifi")
library("MPsychoR")
data("granularity")
granularity1 <- scale(granularity[,1:2]) |> as.data.frame()
head(granularity1)
#> gran age
#> 1 1.34472707 -1.834131
#> 2 0.70924034 -1.834131
#> 3 -0.01834218 -1.815128
#> 4 0.79902412 -1.644100
#> 5 1.05238391 -1.758118
#> 6 1.82151872 -1.644100
plot(granularity1[,2:1], main = "Scatterplot")We get a standardized slope of -0.2417, and an \(R^2\) of 0.058.
Now we mimic this model using morals. Note that here
we operate on the unstandardized, original data. Since we are performing
linear (metric) transformations on granularity as well as on age,
morals() will standardize the data internally. Note that
morals() requires to set up the linear transformations
using splines: type = "E" in knotsGifi()
implies no interior knots, xdegrees = 1 and
ydegrees = 1 tells morals() to fit a
polynomial of degree 1, and we also set the ordinal arguments to
FALSE.
xknots_age <- knotsGifi(granularity$age, type = "E")
yknots_gran <- knotsGifi(granularity$gran, type = "E")
fitlin2 <- morals(x = granularity$age, y = granularity$gran,
xknots = xknots_age, yknots = yknots_gran, xdegrees = 1, ydegrees = 1,
xordinal = FALSE, yordinal = FALSE)
fitlin2
#> Call:
#> morals(x = granularity$age, y = granularity$gran, xknots = xknots_age,
#> yknots = yknots_gran, xdegrees = 1, ydegrees = 1, xordinal = FALSE,
#> yordinal = FALSE)
#>
#> Loss value: 0.379
#> Number of iterations: 15
#>
#> Coefficients:
#> x
#> -0.2417
#>
#> Multiple R-squared: 0.058The results are the same as in the lm() fit above.
Let us extend these models by fitting quadratic
regressions. First, we call lm() on the
unstandardized data (we could also use the standardized data, it doesn’t
matter). Then we emulate this quadratic regression with
morals(). Note that, as opposed to the linear fit, we set
the polynomial degree of the predictor to a value of 2.
fitquad1 <- lm(gran ~ age + I(age^2), data = granularity)
fitquad1
#>
#> Call:
#> lm(formula = gran ~ age + I(age^2), data = granularity)
#>
#> Coefficients:
#> (Intercept) age I(age^2)
#> 1.275380 -0.071488 0.002052
fitquad2 <- morals(x = granularity$age, y = granularity$gran,
xknots = xknots_age, yknots = yknots_gran, xdegrees = 2, ydegrees = 1,
xordinal = FALSE, yordinal = FALSE)
fitquad2
#> Call:
#> morals(x = granularity$age, y = granularity$gran, xknots = xknots_age,
#> yknots = yknots_gran, xdegrees = 2, ydegrees = 1, xordinal = FALSE,
#> yordinal = FALSE)
#>
#> Loss value: 0.281
#> Number of iterations: 21
#>
#> Coefficients:
#> x
#> -0.4372
#>
#> Multiple R-squared: 0.191Let us produce two plots based on the quadratic morals()
fit and then explain how morals() achieves a quadratic fit.
The top panel shows the optimally scaled predictor and
response, including the regression line. The bottom panel shows
quadratic line when age is kept on the original scale while the
response is on the transformed scale.
plot(fitquad2$xhat, fitquad2$yhat, xlab = "Age (transformed)", ylab = "Granularity (transformed)",
main = "Optimally Scaled Scatterplot")
lines(fitquad2$xhat, fitquad2$ypred, col = "coral4", lwd = 2)
plot(granularity$age, fitquad2$yhat, xlab = "Age", ylab = "Granularity (transformed)",
main = "Quadratic Morals Fit")
ind <- order(granularity$age)
lines(granularity$age[ind], fitquad2$ypred[ind], col = "coral4", lwd = 2)For both models we get an \(R^2\) of 0.191. This reflects a substantial increase compared to the linear fit.
Both lm() and morals() calls lead to the
same results even though in the LM fit we get 2 slope parameters (one
for the linear term and one for the quadratic term), whereas in
morals() we only get a single one.
In the morals() call we specified a linear
transformation for the response and a quadratic transformation for the
predictor which leads to a quadratic regression fit (see
plot(fitquad2, plot.type = "transplot")). Internally,
morals() scales both variables according to these
transformation functions in a way such that their relationship becomes
linear (sometimes called bilinearizability; see top panel of
the figure above). The parameter we get from the morals()
output reflects the slope of this line. The sign of this parameter is
irrelavant since it can happen that, based on the starting values used
for the OS process, the sign of the transformed age scores can be
switched.
If we produce the scatterplot with the original age scale we see what
we actually fitted (bottom panel of the figure above): a quadratic
regression where the granularity score decreases for the younger
participants and increases for the older ones. On the y-axis we plot the
transformed (i.e., standardized) response in order to be able to picture
the curve. A cubic regression can be fitted by setting
xdegrees=3.
The next model we fit is a piecewise linear regression. That is, we want to fit two lines which connect at a particular age knot. In spline terminology we fit a spline of degree 1 with one interior knot. Using the knots utility function for setting quantile knots, it places the knot at the median.
xknots_age2 <- knotsGifi(granularity$age, "Q", n = 1)
xknots_age2
#> [[1]]
#> 50%
#> 15.3
#>
#> attr(,"type")
#> [1] "knotsQ"Alternatively we could also set a knot at a particular age value of
interest (e.g., 18 years) by modifying xknots_age2 as
follows:
Now we fit the piecewise regression model and produce the effects plot:
fitpiece <- morals(x = granularity$age, y = granularity$gran,
xknots = xknots_age2, yknots = yknots_gran, xdegrees = 1, ydegrees = 1,
xordinal = FALSE, yordinal = FALSE)
fitpiece
#> Call:
#> morals(x = granularity$age, y = granularity$gran, xknots = xknots_age2,
#> yknots = yknots_gran, xdegrees = 1, ydegrees = 1, xordinal = FALSE,
#> yordinal = FALSE)
#>
#> Loss value: 0.276
#> Number of iterations: 27
#>
#> Coefficients:
#> x
#> 0.4482
#>
#> Multiple R-squared: 0.201plot(fitpiece$xhat, fitpiece$yhat, xlab = "Age (transformed)", ylab = "Granularity (transformed)",
main = "Optimally Scaled Scatterplot")
lines(fitpiece$xhat, fitpiece$ypred, col = "coral4", lwd = 2)
plot(granularity$age, fitpiece$yhat, xlab = "Age", ylab = "Granularity (transformed)",
main = "Piecewise Linear Morals Fit")
ind <- order(granularity$age)
lines(granularity$age[ind], fitpiece$ypred[ind], col = "coral4", lwd = 2)We get an \(R^2\) of 0.201. The regression fit with the break at 18 years is shown in the bottom panel of the figure above.
Next, we fit a spline regression using a quadratic
spline (xdegrees = 2) with 3 interior quantile knots (i.e.,
we set them at the terciles), and produce the effects plots as
above:
xknots_age3 <- knotsGifi(granularity$age, "Q", n = 3)
xknots_age3
#> [[1]]
#> 25% 50% 75%
#> 11.35 15.30 19.80
#>
#> attr(,"type")
#> [1] "knotsQ"
fitspline <- morals(granularity$age, granularity$gran,
xknots = xknots_age3, yknots = yknots_gran, xdegrees = 2, ydegrees = 1,
xordinal = FALSE, yordinal = FALSE)
fitspline
#> Call:
#> morals(x = granularity$age, y = granularity$gran, xknots = xknots_age3,
#> yknots = yknots_gran, xdegrees = 2, ydegrees = 1, xordinal = FALSE,
#> yordinal = FALSE)
#>
#> Loss value: 0.268
#> Number of iterations: 25
#>
#> Coefficients:
#> x
#> 0.4639
#>
#> Multiple R-squared: 0.215plot(fitspline$xhat, fitspline$yhat, xlab = "Age (transformed)", ylab = "Granularity (transformed)",
main = "Optimally Scaled Scatterplot")
lines(fitspline$xhat, fitspline$ypred, col = "coral4", lwd = 2)
plot(granularity$age, fitspline$yhat, xlab = "Age", ylab = "Granularity (transformed)",
main = "Spline Morals Fit")
ind <- order(granularity$age)
lines(granularity$age[ind], fitspline$ypred[ind], col = "coral4", lwd = 2)Compared to the quadratic and the piecewise linear fit, there is only a slight improvement in \(R^2\) 0.215. The bottom panel of the figure above shows the corresponding smooth regression fit.
Next we present a monotone regression fit. Montone
regression fits a step function into the scatterplot. If the step
function is monotonically decreasing, it is called antitone
regression. If it is monotonically increasing, isotone
regression. Isotone regression is described in detail in De Leeuw, Hornik, and Mair (2009), who also
provide a flexible implementation of various isotone regression
techniques by means of the package isotone. In
Gifi, a monotone regression can be fitted as follows (the
knots are set at the data points).
xknots_age4 <- knotsGifi(granularity$age, "D")
fitmono <- morals(x = granularity$age, y = granularity$gran,
xknots = xknots_age4, yknots = yknots_gran, ydegrees = 1, yordinal = FALSE)plot(fitmono$xhat, fitmono$yhat, xlab = "Age (transformed)", ylab = "Granularity (transformed)",
main = "Optimally Scaled Scatterplot")
lines(fitmono$xhat, fitmono$ypred, col = "coral4", lwd = 2)
plot(granularity$age, fitmono$yhat, xlab = "Age", ylab = "Granularity (transformed)",
main = "Monotone Morals Fit")
sfun <- stepfun(sort(granularity$age)[-nrow(granularity)], fitmono$ypred[order(granularity$age)])
plot(sfun, col = "coral4", add = TRUE, pch = 19, cex = 0.7, lwd = 2)Obviously this U-shaped relationship is not the best example for a monotone regression fit. Still, it shows nicely how monotone regression works. In the figure above we see that after a certain point the descending step function is not able anymore to capture the increasing granularity pattern for the older participants, since it is restricted to be monotone. Thus, it becomes a flat line. The \(R^2\) of 0.193 is still good because the model is very precise for the younger participants.
The last model we fit uses nominal transformation of age which leads
to the most flexible specification. This model is nonsense, of course:
aside from totally overfitting the data, it doesn’t take any metric or
ordinal age information into account. But it certainly
demonstrates the flexibility of morals.
xknots_age5 <- knotsGifi(granularity$age, "D")
fitnom <- morals(granularity$age, granularity$gran,
xknots = xknots_age5, yknots = yknots_gran, xdegrees = 1, ydegrees = 1,
xordinal = FALSE, yordinal = FALSE)plot(fitnom$xhat, fitnom$yhat, xlab = "Age (transformed)", ylab = "Granularity (transformed)",
main = "Optimally Scaled Scatterplot")
lines(fitnom$xhat, fitnom$ypred, col = "coral4", lwd = 2)
plot(granularity$age, fitnom$yhat, xlab = "Age", ylab = "Granularity (transformed)",
main = "Nominal Morals Fit")
ind <- order(granularity$age)
lines(granularity$age[ind], fitnom$ypred[ind], col = "coral4", lwd = 2)This model has a remarkable \(R^2\)-value of 0.729. But we overfit, of course.
At this point the question is: Which model should we use? We have
seen that, apart from the linear and the nominal model, all \(R^2\)-values were in the 0.2 range. In
order to examine potential overfitting and stability of these models, we
can perform a \(K\)-fold
cross-validation (CV). Avery basic version is implemented in
cv.morals(), subject to further refinement in the future.
For a typical value of \(K=10\), the
idea is to leave out 10 observations at random, fit
morals() without these observations, and then use this
model to predict the left out observations. For each model we compute a
CV error:
\[\begin{equation} CV_{(K)} = \frac{1}{K} \sum_{k=1}^K ({\hat y_k} - y_k)^2, \end{equation}\]
where \(y_k\) are the observed (but scaled) response values in fold \(k\), and \(\hat y_k}\) are the predicted values based on the fit without the left-out observations. In other words, for prediction we use the optimally scaled linear morals model as shown in the left panels of the figures above. The smaller \(CV_{(K)}\), the less we tend to overfit the data.
set.seed(123)
cvlin <- cv(fitlin2, folds = 10)
cvquad <- cv(fitquad2, folds = 10)
cvpiece <- cv(fitpiece, folds = 10)
cvspline <- cv(fitspline, folds = 10)
cvmono <- cv(fitmono, folds = 10)
cvnom <- cv(fitnom, folds = 10)
cvvec <- c(cvlin, cvquad, cvpiece, cvspline, cvmono, cvnom)
r2vec <- c(fitlin2$smc, fitquad2$smc, fitpiece$smc, fitspline$smc,
fitmono$smc, fitnom$smc)
cvr2 <- cbind(cvvec, r2vec)
dimnames(cvr2) <- list(c("linear", "quadratic", "piecewise", "spline",
"monotone", "nominal"), c("CV-error", "R2"))
round(cvr2, 5)
#> CV-error R2
#> linear 0.00671 0.05841
#> quadratic 0.00915 0.19114
#> piecewise 0.00947 0.20090
#> spline 0.00876 0.21516
#> monotone 0.00639 0.19345
#> nominal 0.01060 0.72935For the nominal model the CV-error is large, but the \(R^2\) is high. Conversely, for the linear model is the CV-error is low, but the \(R^2\) is bad.
Morals can easily be extended to multiple predictors. The starting
point is the linear model from above. We add a categorical predictor
(gender; converted into numerical), and an interaction
between gender and age.
granularity2 <- granularity
granularity2$gender <- as.numeric(granularity$gender)-1
granularity2 <- scale(granularity2) |> as.data.frame()
head(granularity2)
#> gran age gender
#> 1 1.34472707 -1.834131 -1.1229253
#> 2 0.70924034 -1.834131 0.8843037
#> 3 -0.01834218 -1.815128 0.8843037
#> 4 0.79902412 -1.644100 -1.1229253
#> 5 1.05238391 -1.758118 0.8843037
#> 6 1.82151872 -1.644100 0.8843037
fitmlin1 <- lm(gran ~ -1 + age*gender, data = granularity2)
fitmlin1
#>
#> Call:
#> lm(formula = gran ~ -1 + age * gender, data = granularity2)
#>
#> Coefficients:
#> age gender age:gender
#> -0.25475 0.07687 0.06547Now we mimic this model using morals(). We operate on
the standardized data and create the interaction terms as follows:
granularity2$int <- granularity2$age * granularity2$gender
xknots_age <- knotsGifi(granularity2[,2:4], "E")
yknots_gran <- knotsGifi(granularity2$gran, "E")
fitmlin2 <- morals(x = granularity2[,2:4], y= granularity2$gran,
xknots = xknots_age, yknots = yknots_gran, xdegrees = 1, ydegrees = 1,
xordinal = FALSE, yordinal = FALSE)
fitmlin2
#> Call:
#> morals(x = granularity2[, 2:4], y = granularity2$gran, xknots = xknots_age,
#> yknots = yknots_gran, xdegrees = 1, ydegrees = 1, xordinal = FALSE,
#> yordinal = FALSE)
#>
#> Loss value: 0.369
#> Number of iterations: 27
#>
#> Coefficients:
#> age gender int
#> -0.2548 0.0769 0.0648
#>
#> Multiple R-squared: 0.068The morals() result matches the one from the
lm() call.
So far we have not touched the response variable in terms of using a
transformation other than linear (which just standardizes \(Y\)). Let us fit an ordinal response
version using morals(). We categorize the
granularity into a Likert-type item (5 categories). First
we fit a proportional odds model using polr() with
a quadratic age effect and a main effect for
gender. By default, polr() uses a logit
link.
library("MASS")
#>
#> Attaching package: 'MASS'
#> The following object is masked from 'package:Gifi':
#>
#> mammals
grancat <- cut(granularity$gran, 5, labels = 1:5)
fitord1 <- polr(grancat ~ age + I(age^2) + gender, data = granularity)
summary(fitord1)
#>
#> Re-fitting to get Hessian
#> Call:
#> polr(formula = grancat ~ age + I(age^2) + gender, data = granularity)
#>
#> Coefficients:
#> Value Std. Error t value
#> age -0.89427 0.191634 -4.6665
#> I(age^2) 0.02568 0.005996 4.2829
#> genderfemale 0.17924 0.309393 0.5793
#>
#> Intercepts:
#> Value Std. Error t value
#> 1|2 -9.1019 1.5014 -6.0622
#> 2|3 -7.3763 1.4643 -5.0373
#> 3|4 -6.1961 1.4308 -4.3305
#> 4|5 -4.4419 1.3773 -3.2250
#>
#> Residual Deviance: 410.3663
#> AIC: 424.3663There is basically no gender effect. In morals() we do
not need to specify a particular link function. Having an ordinal
response we can simply apply an ordinal transformation on it.
granularity3 <- granularity
granularity3$gender <- as.numeric(granularity$gender)
xknots_age <- knotsGifi(granularity3[,2:3], "E")
yknots_gran2 <- knotsGifi(grancat, "D")
fitord2 <- morals(x = granularity3[,2:3], y = as.numeric(grancat),
xknots = xknots_age, yknots = yknots_gran2, xdegrees = c(2, -1), ydegrees = 1,
xordinal = FALSE, yordinal = TRUE)
fitord2
#> Call:
#> morals(x = granularity3[, 2:3], y = as.numeric(grancat), xknots = xknots_age,
#> yknots = yknots_gran2, xdegrees = c(2, -1), ydegrees = 1,
#> xordinal = FALSE, yordinal = TRUE)
#>
#> Loss value: 0.27
#> Number of iterations: 26
#>
#> Coefficients:
#> age gender
#> -0.4573 0.0299
#>
#> Multiple R-squared: 0.211The figure shows the transformation plots. For the response we have an ordinal transformation which stretches/squeezes the 1-5 input categories. Age is quadratic, and gender is taken as nominal. The parameters do not have the same meaning as is a logistic regression context.
From the coefficients and in the plot below we see that there is a large effect for age, and the gender effect is close to 0. Again, these effects represent the slope parameters in the linearized regressions based on the transformed scores (see figures below).
plot(fitord2$xhat[,1], fitord2$yhat, xlab = "Age (transformed)", ylab = "Granularity (transformed)",
main = "Optimally Scaled Scatterplot")
lines(fitord2$xhat[,1][granularity3$gender == 1], fitord2$ypred[granularity3$gender == 1], col = "coral4", lwd = 2)
lines(fitord2$xhat[,1][granularity3$gender == 2], fitord2$ypred[granularity3$gender == 2], col = "cadetblue", lwd = 2)
legend(-0.02, 0.14, bty = "n", legend = c("male", "female"), lty = 1, col = c("coral4", "cadetblue"))
plot(granularity3$age, fitord2$yhat, xlab = "Age", ylab = "Granularity (transformed)", main = "Ordinal Polynomial Morals Fit")
ind1 <- order(granularity3$age[granularity3$gender == 1])
lines(granularity3$age[granularity3$gender == 1][ind1], fitord2$ypred[granularity3$gender == 1][ind1], col = "coral4", lwd = 2)
ind2 <- order(granularity3$age[granularity3$gender == 2])
lines(granularity3$age[granularity3$gender == 2][ind2], fitord2$ypred[granularity3$gender == 2][ind2], col = "cadetblue", lwd = 2)
legend(20, 0.14, bty = "n", legend = c("male", "female"), lty = 1, col = c("coral4", "cadetblue"))