% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/SETRED.R
\name{setred}
\alias{setred}
\title{General Interface for SETRED model}
\usage{
setred(
  dist = "Euclidean",
  learner,
  theta = 0.1,
  max.iter = 50,
  perc.full = 0.7,
  D = NULL
)
}
\arguments{
\item{dist}{A distance function or the name of a distance available
in the \code{proxy} package to compute. Default is "Euclidean"
the distance matrix in the case that \code{D} is \code{NULL}.}

\item{learner}{model from parsnip package for training a supervised base classifier
using a set of instances. This model need to have probability predictions
(or optionally a distance matrix) and it's corresponding classes.}

\item{theta}{Rejection threshold to test the critical region. Default is 0.1.}

\item{max.iter}{maximum number of iterations to execute the self-labeling process.
Default is 50.}

\item{perc.full}{A number between 0 and 1. If the percentage
of new labeled examples reaches this value the self-training process is stopped.
Default is 0.7.}

\item{D}{A distance matrix between all the training instances. This matrix is used to
construct the neighborhood graph. Default is NULL, this means the
method create a matrix with dist param}
}
\value{
(When model fit) A list object of class "setred" containing:
\describe{
\item{model}{The final base classifier trained using the enlarged labeled set.}
\item{instances.index}{The indexes of the training instances used to
train the \code{model}. These indexes include the initial labeled instances
and the newly labeled instances.
Those indexes are relative to \code{x} argument.}
\item{classes}{The levels of \code{y} factor.}
\item{pred}{The function provided in the \code{pred} argument.}
\item{pred.pars}{The list provided in the \code{pred.pars} argument.}
}
}
\description{
SETRED (SElf-TRaining with EDiting) is a variant of the self-training
classification method (as implemented in the function \code{\link{selfTraining}}) with a different addition mechanism.
The SETRED classifier is initially trained with a
reduced set of labeled examples. Then, it is iteratively retrained with its own most
confident predictions over the unlabeled examples. SETRED uses an amending scheme
to avoid the introduction of noisy examples into the enlarged labeled set. For each
iteration, the mislabeled examples are identified using the local information provided
by the neighborhood graph.
}
\details{
SETRED initiates the self-labeling process by training a model from the original
labeled set. In each iteration, the \code{learner} function detects unlabeled
examples for which it makes the most confident prediction and labels those examples
according to the \code{pred} function. The identification of mislabeled examples is
performed using a neighborhood graph created from the distance matrix.
Most examples possess the same label in a neighborhood. So if an example locates
in a neighborhood with too many neighbors from different classes, this example should
be considered problematic. The value of the \code{theta} argument controls the confidence
of the candidates selected to enlarge the labeled set. The lower this value is, the more
restrictive is the selection of the examples that are considered good.
For more information about the self-labeled process and the rest of the parameters, please
see \code{\link{selfTraining}}.
}
\examples{
library(tidyverse)
library(tidymodels)
library(caret)
library(SSLR)

data(wine)

set.seed(1)
train.index <- createDataPartition(wine$Wine, p = .7, list = FALSE)
train <- wine[ train.index,]
test  <- wine[-train.index,]

cls <- which(colnames(wine) == "Wine")

#\% LABELED
labeled.index <- createDataPartition(wine$Wine, p = .2, list = FALSE)
train[-labeled.index,cls] <- NA

#We need a model with probability predictions from parsnip
#https://tidymodels.github.io/parsnip/articles/articles/Models.html
#It should be with mode = classification

#For example, with Random Forest
rf <-  rand_forest(trees = 100, mode = "classification") \%>\%
  set_engine("randomForest")


m <- setred(learner = rf,
            theta = 0.1,
            max.iter = 2,
            perc.full = 0.7) \%>\% fit(Wine ~ ., data = train)


#Accuracy
predict(m,test) \%>\%
  bind_cols(test) \%>\%
  metrics(truth = "Wine", estimate = .pred_class)



#Another example, with dist matrix

distance <- as.matrix(proxy::dist(train[,-cls], method ="Euclidean",
                                  by_rows = TRUE, diag = TRUE, upper = TRUE))

m <- setred(learner = rf,
            theta = 0.1,
            max.iter = 2,
            perc.full = 0.7,
            D = distance) \%>\% fit(Wine ~ ., data = train)

#Accuracy
predict(m,test) \%>\%
  bind_cols(test) \%>\%
  metrics(truth = "Wine", estimate = .pred_class)
}
\references{
Ming Li and ZhiHua Zhou.\cr
\emph{Setred: Self-training with editing.}\cr
In Advances in Knowledge Discovery and Data Mining, volume 3518 of Lecture Notes in
Computer Science, pages 611-621. Springer Berlin Heidelberg, 2005.
ISBN 978-3-540-26076-9. doi: 10.1007/11430919 71.
}
