nebula {nebula}R Documentation

Association analysis of a multi-subject single-cell data set using a fast negative binomial mixed model

Description

Association analysis of a multi-subject single-cell data set using a fast negative binomial mixed model

Usage

nebula(
  count,
  id,
  pred = NULL,
  offset = NULL,
  min = c(1e-04, 1e-04),
  max = c(10, 1000),
  model = "NBGMM",
  method = "LN",
  cutoff_cell = 20,
  kappa = 200,
  opt = "lbfgs",
  verbose = TRUE,
  cpc = 0.005
)

Arguments

count

A raw count matrix of the single-cell data. The rows are the genes, and the columns are the cells. The matrix can be a matrix object or a sparse dgCMatrix object.

id

A vector of subject IDs. The length should be the same as the number of columns of the count matrix.

pred

A design matrix of the predictors. The rows are the cells and the columns are the predictors. If not specified, an intercept column will be generated by default.

offset

A vector of the scaling factor. The values must be strictly positive. If not specified, a vector of all ones will be generated by default.

min

Minumum values for the overdispersions parameters σ^2 and φ. Must be positive. The default is c(1e-4,1e-4).

max

Maximum values for the overdispersions parameters σ^2 and φ. Must be positive. The default is c(10,1000).

model

'NBGMM', 'PMM' or 'NBLMM'. 'NBGMM' is for fitting a negative binomial gamma mixed model. 'PMM' is for fitting a Poisson gamma mixed model. 'NGLMM' is for fitting a negative binomial lognormal mixed model (the same model as that in the lme4 package). The default is 'NBGMM'.

method

'LN' or 'HL'. 'LN' is to use NEBULA-LN and 'HL' is to use NEBULA-HL. This argument is only valid when model='NBGMM'. The default is 'LN'.

cutoff_cell

The data will be refit using NEBULA-HL to estimate both overdispersions if the product of the cells per subject and the estimated cell-level overdispersion paremeter φ is smaller than cutoff_cell. The default is 20.

kappa

Please see the vignettes for more details. The default is 200.

opt

'lbfgs' or 'trust'. Specifying the optimization algorithm used in NEBULA-LN. The default is 'lbfgs'. If it is 'trust', a trust region algorithm based on the Hessian matrix wil be used for optimization.

verbose

An optional logical scalar indicating whether to print additional messages. Default is FALSE.

cpc

A non-negative threshold for filtering low-expressed genes. Genes with counts per cell smaller than the specified value will not be analyzed.

Value

summary: The estimated coefficient, standard erro and p-value for each predictor.

overdispersion: The estimated cell-level and subject-level overdispersions σ^2 and φ^{-1}.

convergence: More information about the convergence of the algorithm for each gene.

algorithm: The algorithm used for analyzing the gene.

Examples

library(nebula)
data(sample_data)
pred = model.matrix(~X1+X2+cc,data=sample_data$pred)
re = nebula(count=sample_data$count,id=sample_data$sid,pred=pred)


[Package nebula version 1.0.1 Index]