% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/target_encoding_lab.R
\name{target_encoding_lab}
\alias{target_encoding_lab}
\title{Convert categorical predictors to numeric via target encoding}
\usage{
target_encoding_lab(
  df = NULL,
  response = NULL,
  predictors = NULL,
  encoding_method = "loo",
  smoothing = 0,
  overwrite = FALSE,
  quiet = FALSE,
  ...
)
}
\arguments{
\item{df}{(required; dataframe, tibble, or sf) A dataframe with responses
(optional) and predictors. Must have at least 10 rows for pairwise
correlation analysis, and \code{10 * (length(predictors) - 1)} for VIF.
Default: NULL.}

\item{response}{(optional, character string) Name of a numeric response variable in \code{df}. Default: NULL.}

\item{predictors}{(optional; character vector or NULL) Names of the
predictors in \code{df}. If NULL, all columns except \code{responses} and
constant/near-zero-variance columns are used. Default: NULL.}

\item{encoding_method}{(optional; character vector or NULL). Name of the target encoding methods. One or several of: "mean", "rank", "loo". If NULL, target encoding is ignored, and \code{df} is returned with no modification. Default: "loo"}

\item{smoothing}{(optional; integer vector) Argument of the method "mean". Groups smaller than this number have their means pulled towards the mean of the response across all cases. Default: 0}

\item{overwrite}{(optional; logical) If TRUE, the original predictors in \code{df} are overwritten with their encoded versions, but only one encoding method, smoothing, white noise, and seed are allowed. Otherwise, encoded predictors with their descriptive names are added to \code{df}. Default: FALSE}

\item{quiet}{(optional; logical) If FALSE, messages are printed. Default: FALSE.}

\item{...}{(optional) Internal args (e.g. \code{function_name} for
\code{\link{validate_arg_function_name}}, a precomputed correlation matrix
\code{m}, or cross-validation args for \code{\link{preference_order}}).}
}
\value{
dataframe
}
\description{
Target encoding maps the values of categorical variables (of class \code{character} or \code{factor}) to numeric using another numeric variable as reference. The encoding methods implemented here are:

\itemize{
\item "mean" (implemented in \code{\link[=target_encoding_mean]{target_encoding_mean()}}): Maps each category to the average of reference numeric variable across the category cases. Variables encoded with this method are identified with the suffix "__encoded_mean". It has a method to control overfitting implemented via the argument \code{smoothing}. The integer value of this argument indicates a threshold in number of rows. Categories sized above this threshold are encoded with the group mean, while groups below it are encoded with a weighted mean of the group's mean and the global mean. This method is named "mean smoothing" in the relevant literature.
\item "rank" (implemented in \code{\link[=target_encoding_rank]{target_encoding_rank()}}): Returns the rank of the group as a integer, being 1 he group with the lower mean of the reference variable. Variables encoded with this method are identified with the suffix "__encoded_rank".
\item "loo" (implemented in \code{\link[=target_encoding_loo]{target_encoding_loo()}}): Known as the "leave-one-out method" in the literature, it encodes each categorical value with the mean of the response variable across all other group cases. This method controls overfitting better than "mean". Variables encoded with this method are identified with the suffix "__encoded_loo".
}

Accepts a parallelization setup via \code{future::plan()} and a progress bar via \code{progressr::handlers()} (see examples).
}
\examples{

data(vi_smol)

#applying all methods for a continuous response
df <- target_encoding_lab(
  df = vi_smol,
  response = "vi_numeric",
  predictors = "koppen_zone",
  encoding_method = c(
    "mean",
    "loo",
    "rank"
  )
)

#identify encoded predictors
predictors.encoded <- grep(
  pattern = "*__encoded*",
  x = colnames(df),
  value = TRUE
)

head(df[, predictors.encoded])


}
\references{
\itemize{
\item Micci-Barreca, D. (2001) A Preprocessing Scheme for High-Cardinality Categorical Attributes in Classification and Prediction Problems. SIGKDD Explor. Newsl. 3, 1, 27-32. doi: 10.1145/507533.507538
}
}
\seealso{
Other target_encoding: 
\code{\link{target_encoding_loo}()}
}
\author{
Blas M. Benito, PhD
}
\concept{target_encoding}
