AUC {modEvA} | R Documentation |
This function calculates the Area Under the Curve of the receiver operating characteristic (ROC) plot, for either a model object of class "glm"
, or two matching vectors of observed (binary, 1 for occurrence vs. 0 for non-occurrence) and predicted (continuous, e.g. occurrence probability) values, respectively. The AUC is a measure of the overall discrimination power of the predictions, or the probability that an occurrence site has a higher predicted value than a non-ocurrence site.
AUC(model = NULL, obs = NULL, pred = NULL, simplif = FALSE, interval = 0.01, FPR.limits = c(0, 1), plot = TRUE, diag = TRUE, diag.col = "grey", diag.lty = 1, roc.col = "black", roc.lty = 1, roc.lwd = 2, plot.values = TRUE, plot.digits = 3, plot.preds = FALSE, grid = FALSE, xlab = c("False positive rate", "(1-specificity)"), ylab = c("True positive rate", "(sensitivity)"), main = "ROC curve", ...)
model |
a model object of class "glm". |
obs |
a vector of observed presences (1) and absences (0) or another
binary response variable. This argument is ignored if |
pred |
a vector with the corresponding predicted values of presence probability, habitat suitability, environmental favourability or alike. Must be of the same length and in the same order as |
simplif |
logical, whether to use a faster version that returns only the AUC value (and the ROC plot if plot = TRUE). |
FPR.limits |
(NOT YET IMPLEMENTED) numerical vector of length 2 indicating the limits of false positive rate between which to calculate a partial AUC. The default is c(0, 1), for considering the whole AUC. |
interval |
interval of threshold values at which to calculate the true and false positive and negative rates. Defaults to 0.01. This argument is ignored if |
plot |
logical, whether or not to plot the ROC curve. Defaults to TRUE. |
diag |
logical, whether or not to add the reference diagonal (if plot = TRUE). Defaults to TRUE. |
diag.col |
line colour for the reference diagonal. |
diag.lty |
line type for the reference diagonal. |
roc.col |
line colour for the ROC curve. |
roc.lty |
line type for the ROC curve. |
roc.lwd |
line width for the ROC curve. |
plot.values |
logical, whether or not to show in the plot the values associated to the curve (e.g., the AUC). Defaults to TRUE. |
plot.digits |
integer number indicating the number of digits to which the values in the plot should be |
plot.preds |
logical, whether or not to plot the proportion of analysed model predictions (through proportionally sized circles) at each threshold. Experimental. Defaults to FALSE. |
grid |
logical, whether or not to add a grid to the plot, marking the analysed thresholds. Defaults to FALSE. |
xlab |
label for the x axis. |
ylab |
label for the y axis. |
main |
title for the plot. |
... |
further arguments to be passed to the |
Mind that the AUC has been widely criticized (e.g. Lobo et al. 2008, Jimenez-Valverde et al. 2013), but is still among the most widely used metrics in model evaluation. It is highly correlated with species prevalence, so this value is also provided by the AUC function (if simplif = FALSE
, the default) for reference. Although there are functions to calculate the AUC in other R packages (e.g. ROCR, PresenceAbsence, verification, Epi), the AUC
function is more compatible with the remaining functions in modEvA and can be applied not only to a set of observed versus predicted values, but also directly to a model object of class "glm"
.
If simplif = TRUE
, the function returns only the AUC value (a numeric value between 0 and 1). Otherwise (the default), it returns a list
with the following components:
thresholds |
a data frame of the true and false positives, the sensitivity and specificity of the predictions, and the number of predicted values at each analysed threshold. |
N |
the total number of obervations. |
prevalence |
the proportion of occurrences in the data (which correlates with the AUC). |
AUC |
the value of the AUC). |
AUCratio |
the ratio of the obtained AUC value to the null expectation (0.5). |
A. Marcia Barbosa
Lobo, J.M., Jimenez-Valverde, A. & Real, R. (2008). AUC: a misleading measure of the performance of predictive distribution models. Global Ecology and Biogeography 17: 145-151
Jimenez-Valverde, A., Acevedo, P., Barbosa, A.M., Lobo, J.M. & Real, R. (2013). Discrimination capacity in species distribution models depends on the representativeness of the environmental domain. Global Ecology and Biogeography 22: 508-516
# load sample models: data(rotif.mods) # choose a particular model to play with: mod <- rotif.mods$models[[1]] AUC(model = mod, simplif = TRUE) AUC(model = mod) AUC(model = mod, grid = TRUE, plot.preds = TRUE) # you can also use AUC with vectors of observed and predicted values # instead of with a model object: presabs <- mod$y prediction <- mod$fitted.values AUC(obs = presabs, pred = prediction)