% Generated by roxygen2 (4.1.0): do not edit by hand
% Please edit documentation in R/rf.modelSel.R
\name{rf.modelSel}
\alias{rf.modelSel}
\title{Random Forest Model Selection}
\usage{
rf.modelSel(xdata, ydata, imp.scale = "mir", r = c(0.25, 0.5, 0.75),
  final = FALSE, plot.imp = TRUE, seed = NULL, parsimony = NULL, ...)
}
\arguments{
\item{xdata}{X Data for model}

\item{ydata}{Y Data for model}

\item{imp.scale}{Type of scaling for importance values (mir or se), default is mir}

\item{r}{Vector of importance percentiles to test i.e., c(0.1, 0.2, 0.5, 0.7, 0.9)}

\item{final}{Run final model with selected variables (TRUE/FALSE)}

\item{plot.imp}{Plot variable importance (TRUE/FALSE)}

\item{seed}{Sets random seed in the R global environment. This is highly suggested.}

\item{parsimony}{Threshold for competing model (0-1)}

\item{...}{Arguments to pass to randomForest (e.g., ntree=1000, replace=TRUE, proximity=TRUE)}
}
\value{
A list class object with the following components:

rf.final - Final selected model, if final=TRUE(randomForest model object)

selvars - final selected variables (vector)

test - Validation parameters used on model selection (data.frame)

importance - Importance values for selected model (data.frame)

parameters - Variables used in each tested model (list)
}
\description{
Implements Murphy et al., (2010) Random Forests model selection approach.
}
\note{
If you want to run classification, make sure that y is a factor, otherwise runs in regression mode

The mir scale option performs a row standardization and the se option performs normalization using The "standard errors"

of the permutation-based importance measure. Both options result in a 0-1 range but "se" sums to 1.

The selection criteria are calculated as: mir = i/max(i) and se = (i / se) / ( sum(i) / se).

For regression the model selection criteria is; largest %variation explained, smallest MSE, and fewest parameters.

For classification; Smallest OOB error, smallest maximum within class error, and fewest parameters.
}
\examples{
# Classification on iris data
require(randomForest)
data(iris)
  iris$Species <- as.factor(iris$Species)
( rf.class <- rf.modelSel(iris[,1:4], iris[,"Species"], seed=1234, imp.scale="mir") )
( rf.class <- rf.modelSel(iris[,1:4], iris[,"Species"], seed=1234, imp.scale="mir",
                          parsimony=0.03) )
    vars <- rf.class$PARAMETERS[[3]]
      ( rf.fit <- randomForest(x=iris[,vars], y=iris[,"Species"]) )

# Regression on airquality data
data(airquality)
  airquality <- na.omit(airquality)
( rf.regress <- rf.modelSel(airquality[,2:6], airquality[,1], imp.scale="se") )
( rf.regress <- rf.modelSel(airquality[,2:6], airquality[,1], imp.scale="se", parsimony=0.03) )
    vars <- rf.regress$PARAMETERS[[3]]
      ( rf.fit <- randomForest(x=airquality[,vars], y=airquality[,1]) )
}
\author{
Jeffrey S. Evans  <jeffrey_evans@tnc.org>
}
\references{
Evans, J.S. and S.A. Cushman (2009) Gradient Modeling of Conifer Species Using Random Forest. Landscape Ecology 5:673-683.

Murphy M.A., J.S. Evans, and A.S. Storfer (2010) Quantify Bufo boreas connectivity in Yellowstone National Park with landscape genetics. Ecology 91:252-261

Evans J.S., M.A. Murphy, Z.A. Holden, S.A. Cushman (2011). Modeling species distribution and change using Random Forests CH.8 in Predictive Modeling in Landscape Ecology eds Drew, CA, Huettmann F, Wiersma Y. Springer
}

