hugeRR_update {bigRR} | R Documentation |
This function updates the hugeRR-obtained bigRR object into a new object with heteroscedasticity assumption.
hugeRR_update(obj, Z.name, Z.index, family = gaussian(link = identity), tol.err = 1e-6, tol.conv = 1e-8, save.cache = FALSE)
obj |
A |
Z.name |
file name to be combined with |
Z.index |
file index/indices to be combined with |
family |
the distribution family of |
tol.err |
internal tolerance level for extremely small values; default value is 1e-6. |
tol.conv |
tolerance level in convergence; default value is 1e-8. |
save.cache |
logical; specify whether internal cache files should be saved for fast future repeating analyses. If |
The function does a similar job as the bigRR_update
function, but allows huge size of data (the Z
matrix) that cannot be loaded into computer memory as a whole.
Instead of specifying the entire design matrix for random effects (Z
in bigRR
), the Z
can be split as Z = cbind(Z1, Z2, ..., Zk)
, and each piece of Z
is stored in DatABEL
format with file names specified by the arguments Z.name
and Z.index
.
For example (see also Examples), if the genotype data for each chromosome is stored in DatABEL
format with file names chr1.fvd
& chr1.fvi
, ..., chr22.fvd
& chr22.fvi
, the input argument should be specified as Z.name = 'chr'
and Z.index = 1:22
.
Xia Shen
Shen X, Alam M, Fikse F and Ronnegard L (2013). A novel generalized ridge regression method for quantitative genetics. Genetics, 193, 1255-1268.
# --------------------------------------------- # # Arabidopsis example # # --------------------------------------------- # ## Not run: require(bigRR) data(Arabidopsis) X <- matrix(1, length(y), 1) # splitting the genotype data into two pieces and re-saving in DatABEL format # dimnames(Z) <- list(NULL, NULL) Z <- scale(Z) matrix2databel(Z[,1:100000], 'part1') matrix2databel(Z[,100001:ncol(Z)], 'part2') # fitting SNP-BLUP, i.e. a ridge regression on all the markers across the genome # SNP.BLUP.result <- hugeRR(y = y, X = X, Z.name = 'part', Z.index = 1:2, family = binomial(link = 'logit'), save.cache = TRUE) # re-run SNP-BLUP - a lot faster since cache data are stored SNP.BLUP.result <- hugeRR(y = y, X = X, Z.name = 'part', Z.index = 1:2, family = binomial(link = 'logit')) # fitting HEM, i.e. a generalized ridge regression with marker-specific shrinkage # HEM.result <- hugeRR_update(SNP.BLUP.result, Z.name = 'part', Z.index = 1:2, family = binomial(link = 'logit')) # plot and compare the estimated effects from both methods # split.screen(c(1, 2)) split.screen(c(2, 1), screen = 1) screen(3); plot(abs(SNP.BLUP.result$u), cex = .6, col = 'slateblue') screen(4); plot(abs(HEM.result$u), cex = .6, col = 'olivedrab') screen(2); plot(abs(SNP.BLUP.result$u), abs(HEM.result$u), cex = .6, pch = 19, col = 'darkmagenta') # create a random new genotypes for 10 individuals with the same number of markers # and predict the outcome using the fitted HEM # Z.new <- matrix(sample(c(-1, 1), 10*ncol(Z), TRUE), 10) y.predict <- as.numeric(HEM.result$beta + Z.new %*% HEM.result$u) # # NOTE: The above prediction may not be good due to the scaling in the HEM # fitting above, and alternatively, one can either remove the scaling # above or scale Z.new by row-binding it with the original Z matrix. ## End(Not run)