% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/cv_buffer.R
\name{cv_buffer}
\alias{cv_buffer}
\title{Use buffer around records to separate train and test folds (a.k.a. buffered/spatial leave-one-out)}
\usage{
cv_buffer(
  x,
  column = NULL,
  size,
  presence_bg = FALSE,
  add_bg = FALSE,
  progress = TRUE,
  report = TRUE
)
}
\arguments{
\item{x}{a simple features (sf) or SpatialPoints object of spatial sample data (e.g., species data or ground truth sample for image classification).}

\item{column}{character; indicating the name of the column in which response variable (e.g. species data as a binary
response i.e. 0s and 1s) is stored. This is required when \code{presence_bg = TRUE}, otherwise optional.}

\item{size}{numeric value of the specified range by which training/testing data are separated.
This distance should be in \strong{metres}. The range could be explored by \code{\link{cv_spatial_autocor}}.}

\item{presence_bg}{logical; whether to treat data as species presence-background data. For all other data
types (presence-absence, continuous, count or multi-class responses), this option should be \code{FALSE}.}

\item{add_bg}{logical; add background points to the test set when \code{presence_bg = TRUE}. We do not
recommend this according to Radosavljevic & Anderson (2014). Keep it \code{FALSE}, unless you mean to add
the background pints to testing points.}

\item{progress}{logical; whether to shows a progress bar.}

\item{report}{logical; whether to generate print summary of records in each fold; for very big
datasets, set to \code{FALSE} for faster calculation.}
}
\value{
An object of class S3. A list of objects including:
    \itemize{
    \item{folds_list - a list containing the folds. Each fold has two vectors with the training (first) and testing (second) indices}
    \item{k - number of the folds}
    \item{size - the defined range of spatial autocorrelation)}
    \item{column - the name of the column if provided}
    \item{presence_bg - whether this was treated as presence-background data}
    \item{records - a table with the number of points in each category of training and testing}
    }
}
\description{
This function generates spatially separated train and test folds by considering buffers of
the specified distance (\code{size} parameter) around each observation point.
This approach is a form of \emph{leave-one-out} cross-validation. Each fold is generated by excluding
nearby observations around each testing point within the specified distance (ideally the range of
spatial autocorrelation, see \code{\link{cv_spatial_autocor}}). In this method, the testing set never
directly abuts a training sample (e.g. presence or absence; 0s and 1s). For more information see the details section.
}
\details{
When working with presence-background (presence and pseudo-absence) species distribution
data (should be specified by \code{presence_bg = TRUE} argument), only presence records are used
for specifying the folds (recommended). Consider a target presence point. The buffer is defined around this target point,
using the specified range (\code{size}). By default, the testing fold comprises only the target presence point (all background
points within the buffer are also added when \code{add_bg = TRUE}).
Any non-target presence points inside the buffer are excluded.
All points (presence and background) outside of buffer are used for the training set.
The methods cycles through all the \emph{presence} data, so the number of folds is equal to
the number of presence points in the dataset.

For presence-absence data (and all other types of data), folds are created based on all records, both
presences and absences. As above, a target observation (presence or absence) forms a test point, all
presence and absence points other than the target point within the buffer are ignored, and the training
set comprises all presences and absences outside the buffer. Apart from the folds, the number
of \emph{training-presence}, \emph{training-absence}, \emph{testing-presence} and \emph{testing-absence}
records is stored and returned in the \code{records} table. If \code{column = NULL} and \code{presence_bg = FALSE},
the procedure is like presence-absence data. All other data types (continuous, count or multi-class responses) should be
done by \code{presence_bg = FALSE}.
}
\examples{
\donttest{
library(blockCV)

# import presence-absence species data
points <- read.csv(system.file("extdata/", "species.csv", package = "blockCV"))
# make an sf object from data.frame
pa_data <- sf::st_as_sf(points, coords = c("x", "y"), crs = 7845)

bloo <- cv_buffer(x = pa_data,
                  column = "occ",
                  size = 350000, # size in metres no matter the CRS
                  presence_bg = FALSE)

}
}
\references{
Radosavljevic, A., & Anderson, R. P. (2014). Making better Maxent models of species
distributions: Complexity, overfitting and evaluation. Journal of Biogeography, 41, 629–643. https://doi.org/10.1111/jbi.12227
}
\seealso{
\code{\link{cv_nndm}}, \code{\link{cv_spatial}}, and \code{\link{cv_spatial_autocor}}
}
