RCLRMIX-class {rebmix}R Documentation

Class "RCLRMIX"

Description

Object of class RCLRMIX.

Objects from the Class

Objects can be created by calls of the form new("RCLRMIX", ...).

Slots

x:

an object of class REBMIX.

pos:

a desired row number in x@summary to be clustered. The default value is 1.

Zt:

a factor of true cluster membership.

Zp:

a factor of predictive cluster membership.

c:

number of clusters.

p:

a vector of length c containing prior probabilities of latent class memberships p_{l} summing to 1.

pi:

a list of length d of matrices of size c \times I_{i} containing class conditional probabilities. Let π_{il\tilde{i}} denote the class conditional probability that an observation in class l = 1, …, c produces the \tilde{i}th outcome on the ith variable. It is presumed here that all variables are categorical and conditionally independent, that each categorical variable i = 1, …, c contains I_{i} possible outcomes and that \bm{y}_{1}, …, \bm{y}_{n} stands for an observed d dimensional dataset of size n of discrete vector observations \bm{y}_{j} = (y_{1j}, …, y_{ij}, …, y_{dj})^\top. The variables should follow "Dirac" or "binomial" parametric families with the finite number of outcomes I_{i}.

P:

a data frame containing unique observations for which the true and predictive frequencies are calculated.

tau:

a matrix of size n \times c containing conditional probabilities that observations \bm{y}_{1}, …, \bm{y}_{n} arise from clusters 1, …, c.

prob:

a vector of length c containing probabilities of correct clustering for s = 1, …, c.

from:

a vector of length c - 1 containing clusters merged to to clusters.

to:

a vector of length c - 1 containing clusters originating from from clusters.

EN:

a vector of length c - 1 containing entropies for combined clusters.

ED:

a vector of length c - 1 containing decrease of entropies for combined clusters.

Author(s)

Marko Nagode

References

J. P. Baudry, A. E. Raftery, G. Celeux, K. Lo and R. Gottardo. Combining mixture components for clustering. Journal of Computational and Graphical Statistics, 19(2):332-353, 2010. http://dx.doi.org/10.1198/jcgs.2010.08111

Examples

devAskNewPage(ask = TRUE)

# Generate normal dataset.

n <- c(500, 200, 400)

Theta <- list(pdf1 = rep("normal", 2),
  theta1.1 = c(3, 10),
  theta2.1 = c(3, 0.3, 0.3, 2),
  pdf2 = rep("normal", 2),
  theta1.2 = c(8, 6),
  theta2.2 = c(5.7, -2.3, -2.3, 3.5),
  pdf3 = rep("normal", 2),
  theta1.3 = c(12, 11),
  theta2.3 = c(2, 1, 1, 2))

normal <- RNGMIX(model = "RNGMVNORM", Dataset.name = "normal_1", n = n, Theta = Theta)

# Number of classes or nearest neighbours to be processed.

K <- c(as.integer(1 + log2(sum(n))), # Minimum v follows Sturges rule.
  as.integer(10 * log10(sum(n)))) # Maximum v follows log10 rule.

# Estimate number of components, component weights and component parameters.

normalest <- REBMIX(model = "REBMVNORM",
  Dataset = normal@Dataset,
  Preprocessing = "histogram",
  cmax = 6,
  Criterion = "BIC",
  pdf = rep("normal", 2),
  K = K[1]:K[2])

summary(normalest)

# Plot finite mixture.

plot(normalest)

# Cluster dataset.

normalclu <- RCLRMIX(model = "RCLRMVNORM", x = normalest, Zt = normal@Zt)

# Plot clusters.

plot(normalclu)

summary(normalclu)

[Package rebmix version 2.10.2 Index]