% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/construct_randomforest.R
\name{rf_pred}
\alias{rf_pred}
\title{Predict targeted time series by a random forest model}
\usage{
rf_pred(
  df,
  colname_label,
  vctr_colname_feature = NULL,
  min_nodesize,
  m_try,
  subsample,
  do_outlier_detection = TRUE,
  frac_train = 0.75,
  n_tree = 500,
  ran_seed = 12345,
  coef_iqr = 1.5,
  label_err = -9999
)
}
\arguments{
\item{df}{A data frame including label (explained variable) and feature
(explanatory variables) time series for model input. It is acceptable to
include missing values in each column.}

\item{colname_label}{A character representing the name of the column for the
label time series.}

\item{vctr_colname_feature}{A vector of characters indicating the name of
the feature time series columns used in constructing a random forest model.
If `NULL` (default), all columns excluding the label column specified as
`colname_label` in the input data frame are used as feature columns.}

\item{min_nodesize}{A positive integer indicating the minimal node size (the
minimum number of data points included in each leaf node). This
hyperparameter should be previously optimized by out-of-bag evaluation.}

\item{m_try}{A positive integer indicating the number of features to be used
in splitting each node. This hyperparameter should be previously optimized
by out-of-bag evaluation.}

\item{subsample}{A numerical value between 0 and 1, indicating the fraction
of input training data points to be sampled in constructing the random
forest. This hyperparameter should be previously optimized by
out-of-bag evaluation.}

\item{do_outlier_detection}{A boolean. If `TRUE` (default), this function
predicts the time series to detect outliers; else, this function estimates
the time series to fill gaps.}

\item{frac_train}{A numerical value between 0 and 1, defining the fraction
of data points to be categorized as training data for the random forest
model construction. The other data points are classified as test data.
Default is 0.75.}

\item{n_tree}{An integer representing the number of trees in the random
forest. Default is 500.}

\item{ran_seed}{An integer representing the random seed for the random
forest model construction. Default is 12345.}

\item{coef_iqr}{A positive value defining a multiplier of the interquartile
range (IQR). If the value to be checked is less than Q1 (first quartile) -
`coef_iqr` * IQR or
more than Q3 (third quartile) + `coef_iqr` * IQR, the value is detected as
a random forest outlier. Default is 1.5.}

\item{label_err}{A numeric value representing a missing value in the input
vector(s). Default is -9999.}
}
\value{
A list with two elements. The first element `mse` is the mean squared error
between predicted and original values in the test data set. The second
element `stats` is a data frame, and its contents differ depending on
`do_outlier_detection`.

If `do_outlier_detection` is `TRUE`, the data frame outputs with columns
below:

* The first column, `cleaned`, gives the cleaned time series after replacing
 the detected outliers with the value specified by `label_err`.

* The second column, `flag_out`, gives a flag variable time series
 indicating the status of the cleaned time series (0: the input data point
 is not originally missing and not detected as an outlier; 1: the input data
 point is not originally missing but detected as an outlier; 2: the input
 data point is originally missing).

* The third column, `med`, gives the ensemble median time series calculated
 from estimated values at each time point for each tree in the constructed
 random forest.

* The fourth column, `q1`, gives the ensemble Q1 (first quartile) time
 series calculated from estimated values at each time point for each tree in
 the constructed random forest.

* The fifth column, `q3`, gives the ensemble Q3 (third quartile) time series
 calculated from estimated values at each time point for each tree in the
 constructed random forest.

If `do_outlier_detection` is `FALSE`, the data frame outputs with columns
below:

* The first column, `gapfilled`, gives the gap-filled time series, where
 missing values are replaced with the predicted values from the random
 forest model.

* The second column, `avg_predicted`, gives the ensemble mean time series
 calculated from estimated values at each time point for each tree in the
 constructed random forest.

* The third column, `sd_predicted`, gives the ensemble mean time series
 calculated from estimated values at each time point for each tree in the
 constructed random forest.
}
\description{
`rf_pred()` constructs a random forest model using optimal
 hyperparameters previously determined by out-of-bag evaluation to estimate
 the targeted time series.
}
\author{
Yoshiaki Hata
}
\keyword{internal}
