rootogram {countreg}R Documentation

Rootograms for Assessing Goodness of Fit of Probability Models

Description

Rootograms graphically compare (square roots) of empirical frequencies with fitted frequencies from a probability model.

Usage

rootogram(object, ...)

## Default S3 method:
rootogram(object, fitted, breaks = NULL,
  style = c("hanging", "standing", "suspended"),
  scale = c("sqrt", "raw"), plot = TRUE,
  width = NULL, xlab = NULL, ylab = NULL, main = NULL, ...)

## S3 method for class 'rootogram'
plot(x,
  xlim = NULL, ylim = NULL, xlab = NULL, ylab = NULL, main = NULL,
  border = "black", fill = "lightgray", col = "#B61A51",
  lwd = 2, pch = 19, lty = 1, max = NULL, type = NULL, axes = TRUE, ...)

## S3 method for class 'rootogram'
autoplot(object, colour = c("black", "#B61A51"),
  fill = "darkgray", size = c(1.2, 4), ...)

Arguments

object

an object. For the default method this needs to be a vector/table of observed frequencies. Currently, there are also methods for numeric, glm, gam, gamlss, hurdle, and zeroinfl objects.

fitted

an specification of the fitted frequencies. For the default method this needs to be a vector/table. For the numeric method (see below) this can also be a character string. For the regression object methods, this is not needed.

breaks

numeric. Breaks for the histogram intervals.

style

character specifying the syle of rootogram (see below).

scale

character specifying whether raw frequencies or their square roots (default) should be drawn.

plot

logical. Should the plot method be called to draw the computed rootogram?

width

numeric. Widths of the histogram bars.

xlab, ylab, main

graphical parameters.

x

object of class "rootogram".

xlim, ylim, border, fill, col, lwd, pch, lty, type, axes

graphical parameters. These may pertain either to the whole plot or just the histogram or just the fitted line.

max

maximum count displayed.

colour, size

graphical parameters passed to geom_line and geom_point, respectively.

...

further graphical parameters passed to the plotting function.

Details

Rootograms graphically compare frequencies of empirical distributions and fitted probability models. For the observed distribution the histogram is drawn on a square root scale (hence the name) and superimposed with a line for the fitted frequencies. The histogram can be "standing" on the x-axis (as usual), or "hanging" from the fitted curve, or a "suspended" histogram of deviations can be drawn.

rootogram is the generic function for generating rootograms from data or fitted model objects. The workhorse function is the default method (that computes all necessary coordinates based on observed and fitted frequencies and the breaks for the histogram intervals) and the associating plot method that carries out the actual drawing (using base graphics).

There is a wide range of further rootogram methods that all take the following approach: based on a fitted probability model observed and expected frequencies are computed and then the default method is called. Currently, there are methods for glm, gam, gamlss, hurdle, zeroinfl, and zerotrunc models.

Furthermore, there is a numeric method that uses link[MASS]{fitdistr} to obtain a fitted (by maximum likelihood) probability model for a univariate variable. For this method, fitted can either be a character string or a density function that is passed to fitdistr. In the latter case, a start list also has to be supplied.

In addition to the plot method for rootogram objects, there are also two methods that combine two (or more) rootograms: c/rbind creates a set of rootograms that can then be plotted in one go. The + method adds up the observed and fitted frequencies from two rootograms (if these use the same intervals).

The autoplot method creates a ggplot version of the rootogram.

Value

An object of class "rootogram" inheriting from "data.frame" with the following variables:

observed

observed frequencies,

expected

fitted frequencies,

x

histogram interval midpoints on the x-axis,

y

bottom coordinate of the histogram bars,

width

widths of the histogram bars,

height

height of the histogram bars,

line

y-coordinates of the fitted curve.

Additionally, style, scale, xlab, ylab, and main are stored as attributes.

Note

Note that there is also a rootogram function in the vcd package that is similar to the numeric method provided here. However, it is much more limited in scope, hence a function has been created here.

References

Friendly M (2000), Visualizing Categorical Data. SAS Institute, Cary.

Kleiber C, Zeileis A (2016). “Visualizing Count Data Regressions Using Rootograms.” The American Statistician, 70(3), 296–303. doi: 10.1080/00031305.2016.1173590.

Tukey JW (1977). Exploratory Data Analysis. Addison-Wesley, Reading.

See Also

glm, glm.nb, hurdle, zeroinfl

Examples

## different interfaces

## number of deaths by horsekicks in Prussian army (Von Bortkiewicz 1898)
deaths <- rep(0:4, c(109, 65, 22, 3, 1))

## default method: fitted values "by hand"
rootogram(table(deaths), fitted = length(deaths) * dpois(0:4, mean(deaths)))

## numeric method: fitted values via fitdistr()
rootogram(deaths, fitted = "poisson")
rootogram(deaths, fitted = dpois, start = list(lambda = 1),
  breaks = 0:5 - 0.5, width = 0.9)

## glm method: fitted values via glm()
m <- glm(deaths ~ 1, family = poisson)
rootogram(m)

## inspect output (without plotting)
r <- rootogram(m, plot = FALSE)
r

## create ggplot2 version
if(require("ggplot2")) autoplot(r)

#-------------------------------------------------------------------------------

## different styles

## artificial data from negative binomial (mu = 3, theta = 2)
## and Poisson (mu = 3) distribution
set.seed(1090)
y <- rnbinom(100, mu = 3, size = 2)
x <- rpois(100, lambda = 3)

## correctly specified Poisson model fit (mu = 3.34)
par(mfrow = c(2, 3))
rootogram(x, "poisson", style = "standing",  ylim = c(-2.2, 4.8), main = "Standing")
rootogram(x, "poisson", style = "hanging",   ylim = c(-2.2, 4.8), main = "Hanging")
rootogram(x, "poisson", style = "suspended", ylim = c(-2.2, 4.8), main = "Suspended")

## misspecified Poisson model fit (mu = 3.32)
rootogram(y, "poisson", style = "standing",  ylim = c(-2.2, 4.8), main = "Standing")
rootogram(y, "poisson", style = "hanging",   ylim = c(-2.2, 4.8), main = "Hanging")
rootogram(y, "poisson", style = "suspended", ylim = c(-2.2, 4.8), main = "Suspended")
par(mfrow = c(1, 1))

#-------------------------------------------------------------------------------

## artificial data from a t_4 distribution
set.seed(1090)
y <- rt(1000, 4)

## incorrect normal fit (tails too light) and correct t fit
par(mfrow = c(1, 2))
rootogram(y, fitted = "normal", breaks = 40, xlim = c(-6, 6), ylim = c(-2, 14))
rootogram(y, fitted = "t",      breaks = 40, xlim = c(-6, 6), ylim = c(-2, 14))
par(mfrow = c(1, 1))

#-------------------------------------------------------------------------------

## linear regression with normal/Gaussian response: anorexia data
an <- glm(Postwt ~ Prewt + Treat + offset(Prewt), family = gaussian, data = anorexia)
rootogram(an, ylim = c(-1, 4))
abline(h = c(-1, 1), col = "#1E55CE", lty = 2, lwd = 2)

#-------------------------------------------------------------------------------

## count data regression models: crab satellites
data("CrabSatellites", package = "countreg")
cs_p   <-    glm(satellites ~     width + color, data = CrabSatellites, family = poisson)
cs_nb  <- glm.nb(satellites ~     width + color, data = CrabSatellites)
cs_hp  <- hurdle(satellites ~ 1 | width + color, data = CrabSatellites, dist = "poisson")
cs_hnb <- hurdle(satellites ~ 1 | width + color, data = CrabSatellites, dist = "negbin")

## rootograms
par(mfrow = c(2, 2))
rootogram(cs_p, max = 15, main = "Poisson")
rootogram(cs_nb, max = 15, main = "Negative Binomial")
rootogram(cs_hp, max = 15, main = "Hurdle Poisson")
rootogram(cs_hnb, max = 15, main = "Hurdle Negative Binomial")
par(mfrow = c(1, 1))


[Package countreg version 0.2-1 Index]