cv.strk {meteo}R Documentation

k-fold cross-validation for spatio-temporal regression kriging

Description

k-fold cross-validation function for spatio-temporal regression kriging based on pred.strk. The cross-validation is made for STFDF-class, STSDF-class, or for data.frame objects. Currently, only spatial (leave-location-out) cross-validation is implemented. Temporal and spatio-temporal cross-validation will be implemented in the future.

Usage

cv.strk(data,
        zcol=1,
        data.staid.x.y.time = c(1,2,3,4),
        obs,
        obs.staid.time = c(1,2),
        stations,
        stations.staid.x.y = c(1,2,3),
        zero.tol=0,
        reg.coef,
        vgm.model,
        sp.nmax=20,
        time.nmax=2,
        type = "LLO",
        k = 5,
        seed = 42,
        folds,
        fold.column,
        refit = TRUE,
        output.format = "STFDF",
        parallel.processing = FALSE,
        pp.type = "snowfall",
        cpus=detectCores()-1,
        progress=TRUE,
        ...)

Arguments

data

STFDF-class, STSDF-class or data.frame; Contains dependent variable (observations) and covariates in space and time used to perform cross-validation for spatio-temporal regression kriging. If data.frame object, it should have next columns: station ID (staid), longitude (x), latitude (y), time of the observation (time), observation value (obs) and covariates (cov1, cov2, ...). Covariate names should be the same as in the reg.coef (see below). If covariates are missing, then cross-validation for spatio-temporal ordinary kriging is performed.

zcol

numeric or character; Column name or number showing the position of dependent variable (observations) in data. Default is 1.

data.staid.x.y.time

numeric or character vector; Positions or names of the station ID (staid), longitude (x), latitude (y) and time columns in data if data is data.frame. Default is c(1,2,3,4).

obs

data.frame; Contains dependent variable (observations) and covariates in space and time. It should have next columns: station ID (staid), time of the observation (time), observation value (obs) and covariates (cov1, cov2, ...). This object is used together with stations (see below) to create STFDF-class object (if data object is missing) which is then used to perform cross-validation for spatio-temporal regression kriging. If covariates are missing, cross-validation for spatio-temporal ordinary kriging is performed.

obs.staid.time

numeric or character vector; Positions or names of the station ID (staid) and time columns in obs. Default is c(1,2).

stations

data.frame; It should have next columns: station ID (staid), longitude (x) and latitude (y) of the stations. This object is used together with obs (see above) if data object is missing.

stations.staid.x.y

numeric or character vector; Positions or names of the station ID (staid), longitude (x) and latitude (y) columns in stations. Default is c(1,2,3).

zero.tol

numeric; A distance value below (or equal to) which locations are considered as duplicates. Default is 0. See rm.dupl. Duplicates are removed to avoid singular covariance matrices in kriging.

reg.coef

named numeric vector; Named linear regression coefficients. Names of the coefficients (e.g. "Intercept", "temp_geo", "modis", "dem", "twi") will be used to match appropriate covariates from data (or obs). Coefficients for metorological variables (temperature, precipitation, etc.) can be taken from data(tregcoef) or can be specified by the user.

vgm.model

StVariogramModel list; Spatio-temporal variogram of regression residuals (or observations if spatio-temporal ordinary kriging). See vgmST. Spatio-temporal variogram model on residuals for metorological variables (temperature, precipitation, etc.) can be taken from data(tvgms) or can be specified by the user as a vgmST object.

sp.nmax

numeric; A number of spatially nearest observations that should be used for fold kriging predictions. If /codetiling is TRUE (see below), then is a number of spatially nearest observations that should be used for each tile. Deafult is 20.

time.nmax

numeric; A number of temporally nearest observations that should be used for fold kriging predictions Deafult is 2.

type

character; Type of cross-validation: leave-location-out ("LLO"), leave-time-out ("LTO"), and leave-location-time-out ("LLTO"). Default is "LLO". "LTO" and "LLTO" are not implemented yet. Will be in the future.

k

numeric; Number of random folds that will be created with CreateSpacetimeFolds function if folds or fold.column parameters are missing. Default is 5.

seed

numeric; Random seed that will be used to generate folds with CreateSpacetimeFolds function.

folds

numeric or character vector; Showing folds of data observations. Used if fold.column parameter is not specified.

fold.column

numeric or character; Column name or number showing the position of variable in data that represents folds.

refit

logical; If refit of linear regression trend and spatio-teporal variogram should be performed. Spatio-teporal variogram is fit using vgm.model as desired spatio-temporal model for fit.StVariogram function. Default is TRUE.

output.format

character; Format of the output, "STFDF" (default), "STSDF" or "data.frame" (data.frame).

parallel.processing

logical; If parallel processing is performed. Default is FALSE.

pp.type

character; Type (R package) of parallel processing, "snowfall" (default) or "doParallel".

cpus

numeric; Number of processing units. Default is detectCores()-1.

progress

logical; If progress bar is shown. Default is TRUE.

...

Further arguments passed to krigeST or pred.strk.

Value

A STFDF-class, STSDF-class or data.frame obejct (depends on output.format argument), with columns:

obs

Observations.

pred

Predictions from cross-validation.

folds

Folds used for cross-validation.

For accuracy metrics see acc.metric.fun function.

Author(s)

Aleksandar Sekulic asekulic@grf.bg.ac.rs, Milan Kilibarda kili@grf.bg.ac.rs

References

Kilibarda, M., T. Hengl, G. B. M. Heuvelink, B. Graeler, E. Pebesma, M. Percec Tadic, and B. Bajat (2014), Spatio-temporal interpolation of daily temperatures for global land areas at 1 km resolution, J. Geophys. Res. Atmos., 119, 2294-2313, doi:10.1002/2013JD020803.

See Also

acc.metric.fun pred.strk tregcoef tvgms regdata meteo2STFDF tgeom2STFDF

Examples

library(sp)
library(spacetime)
library(gstat)
library(plyr)
library(CAST)
library(doParallel)
library(ranger)
# prepare data
# load observation - data.frame of mean temperatures
data(dtempc)
data(stations)

serbia= point.in.polygon(stations$lon, stations$lat, c(18,22.5,22.5,18), c(40,40,46,46))
st= stations[ serbia!=0, ]
dtempc <- dtempc[dtempc$staid %in% st$staid, ]
dtempc <- dtempc[complete.cases(dtempc),]

# create STFDF
stfdf <- meteo2STFDF(dtempc,st)
# Adding CRS
stfdf@sp@proj4string <- CRS('+proj=longlat +datum=WGS84')

# load covariates for mean temperatures
data(regdata)
# str(regdata)
regdata@sp@proj4string <- CRS('+proj=longlat +datum=WGS84')

# load precalculated variograms
data(tvgms)
data(tregcoef)

# Overlay observations with covariates
time <- index(stfdf@time)
covariates.df <- as.data.frame(regdata)
names_covar <- names(tregcoef[[1]])[-1]
for (covar in names_covar){
  nrowsp <- length(stfdf@sp)
  regdata@sp=as(regdata@sp,'SpatialPixelsDataFrame')
  ov <- sapply(time, function(i) 
    if (covar %in% names(regdata@data)) {
      if (as.Date(i) %in% as.Date(index(regdata@time))) {
        over(stfdf@sp, as(regdata[, i, covar], 'SpatialPixelsDataFrame'))[, covar]
      } else {
        rep(NA, length(stfdf@sp))
      }
    } else {
      over(stfdf@sp, as(regdata@sp[covar], 'SpatialPixelsDataFrame'))[, covar]
    }
  )
  # ov <- do.call('cbind', ov)
  ov <- as.vector(ov)
  if (all(is.na(ov))) {
    stop(paste('There is no overlay of data with covariates!', sep = ""))
  }
  stfdf@data[covar] <- ov
}

# remove stations out of covariates
for (covar in names_covar){
  # count NAs per stations
  numNA <- apply(matrix(stfdf@data[,covar],
                        nrow=nrowsp,byrow=FALSE), MARGIN=1,
                 FUN=function(x) sum(is.na(x)))
  # Remove stations out of covariates
  rem <- numNA != length(time)
  stfdf <-  stfdf[rem,drop=FALSE]
}

# Remove dates out of covariates
rm.days <- c()
for (t in 1:length(time)) {
  if(sum(complete.cases(stfdf[, t]@data)) == 0) {
    rm.days <- c(rm.days, t)
  }
}
if(!is.null(rm.days)){
  stfdf <- stfdf[,-rm.days]
}

# Cross-validation for mean temperature for days "2011-07-05" and "2011-07-06" 
# global model is used for regression and variogram

results <- cv.strk(data = stfdf,
                   zcol=1, # "tempc"
                   reg.coef = tregcoef[[1]],
                   vgm.model = tvgms[[1]],
                   sp.nmax = 20,
                   time.nmax = 2,
                   type = "LLO",
                   k = 5,
                   seed = 42,
                   refit = FALSE,
                   progress=TRUE
)

stplot(results)
summary(results)
# accuracy
acc.metric.fun(results@data$obs, results@data$pred, "RMSE")

# Cross-validation example with data.frames, parallel processing, and refit
# global model is used for regression, variogram, and refit

library(snowfall)
library(doParallel)
# create data.frames
stfdf.df <- as.data.frame(stfdf)

results <- cv.strk(data = stfdf.df,
                   zcol="tempc",
                   data.staid.x.y.time = c("staid","lon","lat","time"),
                   # obs = stfdf.df, # if used, comment data argument
                   # obs.staid.time = c("staid","time"),
                   # stations = stfdf.df,
                   # stations.staid.x.y = c("staid","lon","lat"),
                   zero.tol=0,
                   reg.coef = tregcoef[[1]],
                   vgm.model = tvgms[[1]],
                   sp.nmax = 20,
                   time.nmax = 2,
                   type = "LLO",
                   k = 5,
                   seed = 42,
                   refit = TRUE,
                   parallel.processing = TRUE,
                   pp.type = "snowfall", # "doParallel"
                   cpus=detectCores()-1,
                   progress=TRUE
)

stplot(results)
summary(results)
# accuracy
acc.metric.fun(results@data$obs, results@data$pred, "RMSE")


[Package meteo version 1.0-1 Index]