pred.rfsi {meteo}R Documentation

Random Forest Spatial Interpolation (RFSI) prediction

Description

Function for prediction based on Random Forest Spatial Interpolation (RFSI) model (Sekulić et al. 2020).

Usage

pred.rfsi(model,
          data,
          zcol=1,
          data.staid.x.y.time = c(1,2,3,4),
          obs,
          obs.staid.time = c(1,2),
          stations,
          stations.staid.x.y = c(1,2,3),
          newdata,
          newdata.staid.x.y.time = c(1,2,3),
          zero.tol = 0,
          s.crs = NA,
          newdata.s.crs=NA,
          t.crs = NA,
          output.format = "data.frame",
          cpus = detectCores()-1,
          progress = TRUE,
          ...)

Arguments

model

ranger; An RFSI model made by rfsi function.

data

STFDF-class, STSDF-class or data.frame; Contains dependent variable (observations) used for prediction from an RFSI model. If data.frame object, it should have next columns: station ID (staid), longitude (x), latitude (y), time of the observation (time), and observation value (obs).

zcol

numeric or character; Column name or number showing the position of dependent variable (observations) in data. Default is 1.

data.staid.x.y.time

numeric or character vector; Positions or names of the station ID (staid), longitude (x), latitude (y) and time columns in data if data is data.frame. Default is c(1,2,3,4).

obs

data.frame; Contains dependent variable (observations). It should have next columns: station ID (staid), time of the observation (time), and observation value (obs)). This object is used together with stations (see below) to create STFDF-class object (if data object is missing) which is then used for prediction from an RFSI model.

obs.staid.time

numeric or character vector; Positions or names of the station ID (staid) and time columns in obs. Default is c(1,2).

stations

data.frame; It should have next columns: station ID (staid), longitude (x) and latitude (y) of the stations. This object is used together with obs (see above) if data object is missing.

stations.staid.x.y

numeric or character vector; Positions or names of the station ID (staid), longitude (x) and latitude (y) columns in stations. Default is c(1,2,3).

newdata

STFDF-class, STSDF-class or data.frame; Contains prediction locations and covariates. If data.frame object, it should have next columns: prediction location ID (staid), longitude (x), latitude (y), time of the prediction (time) and covariates (cov1, cov2, ...). Covariate names should be the same as in the model.

newdata.staid.x.y.time

numeric or character vector; Positions or names of the prediction location ID (staid), longitude (x), latitude (y) and time columns in newdata. Default is c(1,2,3).

zero.tol

numeric; A distance value below (or equal to) which locations are considered as duplicates. Default is 0. See rm.dupl.

s.crs

Source CRS-class of observations (data). If NA, read from data

newdata.s.crs

Source CRS-class of (newdata). If NA, read from data

t.crs

Target CRS-class for observations (data) and (newdata) reprojection. Note that observations should be in projection for finding nearest observations based on Eucleadean distances (see function near.obs) and should be the same as one used for making an RFSI model.

output.format

character; Format of the output, "data.frame" (default), "SpatialPointsDataFrame", "SpatialPixelsDataFrame", "STFDF" or "STSDF".

cpus

numeric; Number of processing units. Default is detectCores()-1.

progress

logical; If progress bar is shown. Default is TRUE.

...

Further arguments passed to ranger.

Value

A data.frame, SpatialPointsDataFrame-class, SpatialPixelsDataFrame-class, STFDF-class or STSDF-class object (depends on output.format argument) with prediction (pred column).

Author(s)

Aleksandar Sekulic asekulic@grf.bg.ac.rs

References

Sekulić, A., Kilibarda, M., Heuvelink, G. B., Nikolić, M. & Bajat, B. Random Forest Spatial Interpolation. Remote. Sens. 12, 1687, https://doi.org/10.3390/rs12101687 (2020).

See Also

near.obs rfsi tune.rfsi cv.rfsi

Examples

library(sp)
library(spacetime)
library(gstat)
library(plyr)
library(CAST)
library(doParallel)
library(ranger)
# prepare data
# load observation - data.frame of mean temperatures
data(dtempc)
data(stations)

serbia= point.in.polygon(stations$lon, stations$lat, c(18,22.5,22.5,18), c(40,40,46,46))
st= stations[ serbia!=0, ]
dtempc <- dtempc[dtempc$staid %in% st$staid, ]
dtempc <- dtempc[complete.cases(dtempc),]

# create STFDF
stfdf <- meteo2STFDF(dtempc,st)
# Adding CRS
stfdf@sp@proj4string <- CRS('+proj=longlat +datum=WGS84')

# load covariates for mean temperatures
data(regdata)
data(tregcoef)
# str(regdata)
regdata@sp@proj4string <- CRS('+proj=longlat +datum=WGS84')

# Overlay observations with covariates
time <- index(stfdf@time)
covariates.df <- as.data.frame(regdata)
names_covar <- names(tregcoef[[1]])[-1]
for (covar in names_covar){
  nrowsp <- length(stfdf@sp)
  regdata@sp=as(regdata@sp,'SpatialPixelsDataFrame')
  ov <- sapply(time, function(i) 
    if (covar %in% names(regdata@data)) {
      if (as.Date(i) %in% as.Date(index(regdata@time))) {
        over(stfdf@sp, as(regdata[, i, covar], 'SpatialPixelsDataFrame'))[, covar]
      } else {
        rep(NA, length(stfdf@sp))
      }
    } else {
      over(stfdf@sp, as(regdata@sp[covar], 'SpatialPixelsDataFrame'))[, covar]
    }
  )
  # ov <- do.call('cbind', ov)
  ov <- as.vector(ov)
  if (all(is.na(ov))) {
    stop(paste('There is no overlay of data with covariates!', sep = ""))
  }
  stfdf@data[covar] <- ov
}

# remove stations out of covariates
for (covar in names_covar){
  # count NAs per stations
  numNA <- apply(matrix(stfdf@data[,covar],
                        nrow=nrowsp,byrow=FALSE), MARGIN=1,
                 FUN=function(x) sum(is.na(x)))
  # Remove stations out of covariates
  rem <- numNA != length(time)
  stfdf <-  stfdf[rem,drop=FALSE]
}

# Remove dates out of covariates
rm.days <- c()
for (t in 1:length(time)) {
  if(sum(complete.cases(stfdf[, t]@data)) == 0) {
    rm.days <- c(rm.days, t)
  }
}
if(!is.null(rm.days)){
  stfdf <- stfdf[,-rm.days]
}

# Make an RFSI model
formula = 'tempc ~ temp_geo + modis + dem + twi'  # without nearest obs
t.crs=CRS("+proj=utm +zone=34 +ellps=WGS84 +datum=WGS84 +units=m +no_defs")

rfsi_model <- rfsi(formula=formula,
                   data= stfdf,
                   zero.tol=0,
                   n.obs=5, # nearest obs
                   s.crs=NA, # read from data
                   t.crs=t.crs,
                   cpus=2, #detectCores()-1,
                   progress=TRUE,
                   # ranger parameters
                   importance = "impurity",
                   seed = 42,
                   num.trees = 250,
                   mtry = 5,
                   splitrule = "variance",
                   min.node.size = 5,
                   sample.fraction = 0.95,
                   quantreg = FALSE)

# Make prediction from the RFSI model
newdata = regdata #[,1,drop=FALSE]
# newdata = regdata.df
# newdata.staid.x.y.time = c(1,2,3,4) # if data.frame
# spdf <- newdata[,1]
# spdf@data <- cbind(spdf@data, newdata@sp@data)
output.format = "STFDF" # STSDF # data.frame
zcol=1
model = rfsi_model
newdata.s.crs=NA

rfsi_prediction <- pred.rfsi(model = rfsi_model,
                             data = stfdf,
                             zcol=1,
                             newdata = newdata, # spdf,
                             # newdata.staid.x.y.time = newdata.staid.x.y.time,
                             output.format = "STFDF", # "SpatialPixelsDataFrame",
                             zero.tol=0,
                             s.crs=s.crs,
                             newdata.s.crs=newdata.s.crs,
                             t.crs=t.crs,
                             cpus=2, # detectCores()-1,
                             progress=TRUE
)

stplot(rfsi_prediction)
# plot(rfsi_prediction['pred'])


[Package meteo version 1.0-1 Index]