hugeRR {bigRR} | R Documentation |
Function fits big ridge regression with special computational advantage for the cases when number of shrinkage parameters exceeds number of observations. The shrinkage parameter, lambda, can be pre-specified or estimated along with the model. Any subset of model parameter can be shrunk.
hugeRR(y, X, Z.name, Z.index, weight = NULL, family = gaussian(link = identity), lambda = NULL, only.estimates = FALSE, tol.err = 1e-6, tol.conv = 1e-8, save.cache = FALSE, ...)
y |
the vector of response variable. |
X |
the design matrix related to the parameters not to be shrunk (i.e. fixed effects in the mixed model framework). |
Z.name |
file name to be combined with |
Z.index |
file index/indices to be combined with |
family |
the distribution family of |
weight |
a vector of prior weights for each of the shrinkage parameters. |
lambda |
the shrinkage parameter determines the amount of shrinkage. Default is |
only.estimates |
logical; |
tol.err |
internal tolerance level for extremely small values; default value is 1e-6. |
tol.conv |
tolerance level in convergence; default value is 1e-8. |
save.cache |
logical; specify whether internal cache files should be saved for fast future repeating analyses. If |
... |
unused arguments. |
The function does the same job as the bigRR
function, but allows huge size of data (the Z
matrix) that cannot be loaded into computer memory as a whole.
Instead of specifying the entire design matrix for random effects (Z
in bigRR
), the Z
can be split as Z = cbind(Z1, Z2, ..., Zk)
, and each piece of Z
is stored in DatABEL
format with file names specified by the arguments Z.name
and Z.index
.
For example (see also Examples), if the genotype data for each chromosome is stored in DatABEL
format with file names chr1.fvd
& chr1.fvi
, ..., chr22.fvd
& chr22.fvi
, the input argument should be specified as Z.name = 'chr'
and Z.index = 1:22
.
Returns a list of object class bigRR
containing the following values: (see Examples for how to use the estimated parameters for a prediction purpose.)
phi |
estimated residual variance (Non-genetic variance component). |
lambda |
estimated random effect variance (Genetic variance component). which is proportional to the usual |
beta |
fixed effects estimates - subset of model parameters which is/are not shrunk, i.e. those associated with the |
u |
random effects estimates (genetic effects of each marker) - subset of model parameters which are shrunk, i.e. those associated with the |
leverage |
hat values for the random effects. |
hglm |
the internal fitted |
Call |
how the bigRR was called. |
Xia Shen
Shen X, Alam M, Fikse F and Ronnegard L (2013). A novel generalized ridge regression method for quantitative genetics. Genetics, 193, 1255-1268.
Ronnegard L, Shen X and Alam M (2010): hglm: A Package for Fitting Hierarchical Generalized Linear Models. The R Journal, 2(2), 20-28.
lm.ridge
in MASS library.
# --------------------------------------------- # # Arabidopsis example # # --------------------------------------------- # ## Not run: require(bigRR) data(Arabidopsis) X <- matrix(1, length(y), 1) # splitting the genotype data into two pieces and re-saving in DatABEL format # dimnames(Z) <- list(NULL, NULL) Z <- scale(Z) matrix2databel(Z[,1:100000], 'part1') matrix2databel(Z[,100001:ncol(Z)], 'part2') # fitting SNP-BLUP, i.e. a ridge regression on all the markers across the genome # SNP.BLUP.result <- hugeRR(y = y, X = X, Z.name = 'part', Z.index = 1:2, family = binomial(link = 'logit'), save.cache = TRUE) # re-run SNP-BLUP - a lot faster since cache data are stored SNP.BLUP.result <- hugeRR(y = y, X = X, Z.name = 'part', Z.index = 1:2, family = binomial(link = 'logit')) # fitting HEM, i.e. a generalized ridge regression with marker-specific shrinkage # HEM.result <- hugeRR_update(SNP.BLUP.result, Z.name = 'part', Z.index = 1:2, family = binomial(link = 'logit')) # plot and compare the estimated effects from both methods # split.screen(c(1, 2)) split.screen(c(2, 1), screen = 1) screen(3); plot(abs(SNP.BLUP.result$u), cex = .6, col = 'slateblue') screen(4); plot(abs(HEM.result$u), cex = .6, col = 'olivedrab') screen(2); plot(abs(SNP.BLUP.result$u), abs(HEM.result$u), cex = .6, pch = 19, col = 'darkmagenta') # create a random new genotypes for 10 individuals with the same number of markers # and predict the outcome using the fitted HEM # Z.new <- matrix(sample(c(-1, 1), 10*ncol(Z), TRUE), 10) y.predict <- as.numeric(HEM.result$beta + Z.new %*% HEM.result$u) # # NOTE: The above prediction may not be good due to the scaling in the HEM # fitting above, and alternatively, one can either remove the scaling # above or scale Z.new by row-binding it with the original Z matrix. ## End(Not run)