The Gifi package represents an easier-to-use version of
the homals (De Leeuw and Mair
2009) package for multivariate analysis with optimal scaling
(Gifi 1990). There are two main
differences between Gifi and homals:
Theory: homals uses the concept of rank and set
restrictions to fit methods like princals, morals, overals, etc.
Gifi is based on the concept of copies.
Implementation: homals has a single function called
homals() which, depending on the rank and
sets argument settings, fits various Gifi methods.
Gifi offers wrapper functions like princals(),
homals(), and morals() that allow users to fit
corresponding solutions in a more user-friendly way. It also presents
the results in a more straightforward and accessible manner.
This vignette focuses on the Gifi theory using the idea of
copies. The other vignettes are applied and demonstrate how to
use the main functions of the Gifi package.
The data are collected in an \(n \times
m\) data frame. Whereas the homals package uses the
homogeneity loss function (see De Leeuw and Mair
2009), the Gifi package uses and solves the
following loss function (meet loss, see Gifi (1990), Sec. 4.4.), with SSQ as the
sum-of-squares of a matrix:
\[\sigma(X,Z,A)=\frac{1}{mp}\sum_{j=1}^m\text{SSQ}\ (X-\sum_{i\in I_j}H_iZ_iA_i)\]
The index set \(\mathcal{N}=\{1,2,\cdots,N\}\), where \(N\) is the total number of active variables (see below) in the analysis, is partitioned into the \(m\) index sets \(I_j\), with \(I_j\cap I_l=\emptyset\) and \(\bigcup_{j=1}^m I_j=\mathcal{N}\).
\(X\) is \(n\times p\) matrix of object scores (\(p\) being the number of dimensions). \(X\) is centered \(e'X=0\) and orthonormal \(X'X=I\). For each variable \(i\), the meet loss involves:
Instead of using rank restrictions like homals,
Gifi uses the idea of copies, first introduced by
De Leeuw (1984), which are literally
copies (or duplicates) of variables that enter the loss. Overall, this
concept called multiple quantifications in the original Gifi
terminology, makes the system more flexible.
In addition, in homals all data were categorical and the
basis was always an indicator matrix. In Gifi the basis is
either categorical, or a B-spline basis (van Rijckevorsel 1988) for which the user needs
to specify the knots implying that the data must be numerical.
For each variable \(i\), the
following matrices are returned by various Gifi functions
like homals() and princals():
The Gifi loss is solved using alternating least squares
(ALS), combined with majorization. The gifiEngine()
function alternates over \(X\), and
\(Z_i\) and \(A_i\).
Gifi also allows for declaring variables as
active vs. passive. Active variables are all variables
of main interest, contributing to the loss and to the ALS step that
updates \(X\). Passive (or
supplementary) variables don’t contribute to these components; each of
them is scaled in a separate step via \(\text{SSQ}\ (X-H_iZ_iA_i)\) using the
optimal \(X\).
Gifi provides several options for handling missing
values:
So far, the following wrapper functions are implemented. Internally
they all use the same gifiEngine() function to solve the
loss from above. The main difference between these functions are the
default settings in terms of the number of copies and the number of
sets:
princals(): one variable per set, all variables one
copy.homals(): one variable per set, all variables \(p\) copies.morals(): two sets, one set has a single variable with
one copy (\(p = 1\)).princals() is designed to fit ordinal or mixed PCA in a
user-friendly way, whereas homals() is designed for
multiple correspondence analysis. However, if the default settings for
princals() and homals() are changed
accordingly, they both give the same result. morals()
performs multiple (monotone) regression analysis within the Gifi system.
For these 3 functions applied vignettes are provided.