This package adds resampling methods for the {mlr3} package framework suited for spatial, temporal and spatiotemporal data. These methods can help to reduce the influence of autocorrelation on performance estimates when performing cross-validation. While this article gives a rather technical introduction to the package, a more applied approach can be found in the mlr3book section on “Spatiotemporal Analysis”.
After loading the package via
library("mlr3spatiotempcv"), the spatiotemporal resampling
methods and example tasks provided by {mlr3spatiotempcv} are available
to the user alongside the default {mlr3} resampling methods and
tasks.
To make use of spatial resampling methods, a {mlr3} task that is aware of its spatial characteristic needs to be created. Two child classes exist in {mlr3spatiotempcv} for this purpose:
TaskClassifST$new()TaskRegrST$new()To create one of these, one can either pass a sf object
as the “backend” directly:
# create 'sf' object
data_sf = sf::st_as_sf(ecuador, coords = c("x", "y"))
# create mlr3 task
task = TaskClassifST$new("ecuador_sf",
backend = data_sf, target = "slides", positive = "TRUE"
)or use a plain data.frame. In this case, the constructor
of TaskClassifST needs a few more arguments:
data = mlr3::as_data_backend(ecuador)
task = TaskClassifST$new("ecuador",
backend = data, target = "slides",
positive = "TRUE", extra_args = list(coordinate_names = c("x", "y"))
)Now this Task can be used as a normal {mlr3} task in any kind of modeling scenario. Have a look at the mlr3book section on “Spatiotemporal Analysis” on how to apply a spatiotemporal resampling method to such a task.
In {mlr3}, dictionaries are used for overview purposes of available methods. The following sections show which dictionaries get appended with new entries when loading {mlr3spatiotempcv}.
Additional task types:
TaskClassifSTTaskRegrSTmlr_reflections$task_types
#> type package task learner prediction
#> 1: classif mlr3 TaskClassif LearnerClassif PredictionClassif
#> 2: classif mlr3spatiotempcv TaskClassifST LearnerClassif PredictionClassif
#> 3: regr mlr3 TaskRegr LearnerRegr PredictionRegr
#> 4: regr mlr3spatiotempcv TaskRegrST LearnerRegr PredictionRegr
#> measure
#> 1: MeasureClassif
#> 2: MeasureClassif
#> 3: MeasureRegr
#> 4: MeasureRegrAdditional column roles:
coordinatesspacetimemlr_reflections$task_col_roles
#> $regr
#> [1] "feature" "target" "name" "order" "stratum"
#> [6] "group" "weight" "coordinate" "space" "time"
#>
#> $classif
#> [1] "feature" "target" "name" "order" "stratum"
#> [6] "group" "weight" "coordinate" "space" "time"
#>
#> $classif_st
#> [1] "coordinate" "space" "time"
#>
#> $regr_st
#> [1] "coordinate" "space" "time"Additional resampling methods:
mlr_resampling_spcv_block
mlr_resampling_spcv_buffer
mlr_resampling_spcv_coords
mlr_resampling_spcv_disc
mlr_resampling_spcv_tiles
mlr_resampling_spcv_env
mlr_resampling_sptcv_cluto
mlr_resampling_sptcv_cstf
and their respective repeated versions. See
as.data.table(mlr_resamplings) for the full dictionary.
Additional example tasks:
tsk("ecuador") (spatial, classif)tsk("cookfarm_mlr3") (spatiotemp, regr)The following table lists all spatiotemporal methods implemented in
{mlr3spatiotempcv} (or {mlr3}), their upstream R package and scientific
references. All methods besides "spcv_buffer" also have a
corresponding “repeated” method.
| Category | (Package) Method Name | Reference | mlr3 Notation |
|---|---|---|---|
| Buffering, spatial | (blockCV) Spatial Buffering | Valavi et al. (2018) | mlr_resamplings_spcv_buffer |
| Buffering, spatial | (sperrorest) Spatial Disc | Brenning (2012) | mlr_resamplings_spcv_disc |
| Blocking, spatial | (blockCV) Spatial Blocking | Valavi et al. (2018) | mlr_resamplings_spcv_block |
| Blocking, spatial | (sperrorest) Spatial Tiles | Valavi et al. (2018) | mlr_resamplings_spcv_tiles |
| Clustering, spatial | (sperrorest) Spatial CV | Brenning (2012) | mlr_resamplings_spcv_coords |
| Clustering, feature-space | (blockCV) Environmental Blocking | Valavi et al. (2018) | mlr_resamplings_spcv_env |
| Grouping, predefined inds | (mlr3) Predefined partitions | |
mlr_resamplings_custom_cv |
| Grouping, spatiotemporal | (mlr3) via col_roles "group" |
|
mlr_resamplings_cv,
Task$set_col_roles(<variable, "group") |
| Grouping, spatiotemporal | (CAST) Leave-Location-and-Time-Out | Meyer et al. (2018) | mlr_resamplings_sptcv_cstf,
Task$set_col_roles(<variable>, "space|time") |
| Clustering, spatiotemporal | (skmeans) Spatiotemporal Clustering | Zhao and Karypis (2002) | mlr_resamplings_sptcv_cluto |