distforest {disttree}R Documentation

Distributional Regression Forests

Description

Forests based on maximum-likelihood estimation of parameters for distributions from the GAMLSS family (for generalized additive models for location, scale, and shape).

Usage

distforest(formula, data, na.action = na.pass, cluster, family = NO(), bd = NULL,
           type.tree = "ctree", decorrelate = "none", offset,
           censtype = "none", censpoint = NULL, weights = NULL,
           control = partykit::ctree_control(teststat = "quad", testtype = "Univ", 
           mincriterion = 0, ...),
           ocontrol = list(), type.hessian = c("checklist", "analytic", "numeric"),
           ntree = 500L, fit = TRUE, perturb = list(replace = FALSE, fraction = 0.632), 
           fitted.OOB = TRUE,
           cores = NULL, applyfun = NULL,
           mtry = ceiling(sqrt(nvar)),
           ...)        

Arguments

formula

A symbolic description of the model to be fit. This should be of type y ~ x1 + x2 where y should be the response variable and x1 and x2 are used as partitioning variables.

data

An optional data frame containing the variables in the model.

na.action

A function which indicates what should happen when the data contain NAs.

cluster

An optional vector (typically numeric or factor) with a cluster ID to be employed for clustered covariances in the parameter stability tests.

family

specification of the response distribution. Either a gamlss.family object, a list generating function or a family list.

bd

binomial denominator: additional parameter needed for binomial gamlss.families

type.tree

Specification of the type of tree learner, either "mob" or "ctree".

decorrelate

Specification of the type of decorrelation for the empirical estimating functions (or scores) either "none" or "opg" (for the outer product of gradients) or "vcov" (for the variance-covariance matrix, assuming this is an estimate of the Fisher information).

offset

FIX ME.

censtype

Can either be 'none', 'left' or 'right' to set the type of censoring for censored response.

censpoint

numeric value. Censoring point can be set for censored response.

weights

optional numeric vector of case weights.

control

Control arguments passed to mob or ctree.

ocontrol

List with control parameters passed to optim in distfit.

type.hessian

Can either be 'checklist', 'analytic' or 'numeric' to decide how the hessian matrix should be calculated in the fitting process in distfit. For 'checklist' it is checked whether a function 'hdist' is given in the family list. If so, 'type.hessian' is set to 'analytic', otherwise to 'numeric'.

ntree

Number of trees to grow for the forest.

fit

logical. if TRUE, fitted and predicted values and predicted parameters are calculated for the learning data (together with loglikelihood)

perturb

a list with arguments replace and fraction determining which type of resampling with replace = TRUE referring to the n-out-of-n bootstrap and replace = FALSE to sample splitting. fraction is the number of observations to draw without replacement.

fitted.OOB

logical. if fitted.OOB=TRUE the weights for each observation of the learning data are predicted by predict.cforest with the argument OOB=TRUE (only relevant if fit=TRUE)

cores

numeric. If set to an integer the applyfun is set to mclapply with the desired number of cores.

applyfun

an optional lapply-style function with arguments function(X, FUN, ...). It is used for computing the variable selection criterion. The default is to use the basic lapply function unless the cores argument is specified (see below).

mtry

number of input variables randomly sampled as candidates at each node for random forest like algorithms. Bagging, as special case of a random forest without random input variable sampling, can be performed by setting mtry either equal to Inf or manually equal to the number of input variables.

...

further arguments passed to optim in distfit.

Details

Distributional regression forests are an application of model-based recursive partitioning (implemented in mob, ctree and cforest) to parametric model fits based on the GAMLSS family of distribtuions.

Value

An object of S3 class distforest inheriting from class cforest.

See Also

mob, ctree, cforest, distfit

Examples

df <- distforest(dist ~ speed, data = cars)
predict(df)

[Package disttree version 0.1-0 Index]