Exploring Random Forests with ggRandomForests

John Ehrlinger

2026-06-11

A fitted random forest carries a lot of information, but getting at it usually means digging through list structures that were never meant to be plotted directly. ggRandomForests does that digging for you: it pulls tidy data objects out of a randomForestSRC or randomForest fit, and those objects drop straight into the ggplot2 workflows you already know. A second engine, varPro, powers a parallel family of functions for release-rule importance and related diagnostics; that family is covered in the companion vignette referenced at the end. This vignette walks through the three objects you will reach for most often (gg_error, gg_variable, and gg_vimp), plus a small helper for cutting a predictor into evenly populated groups.

Error trajectories with gg_error()

library(randomForest)
randomForest 4.7-1.2
Type rfNews() to see new features/changes/bug fixes.
set.seed(42)
rf_iris <- randomForest(Species ~ ., data = iris, ntree = 200, keep.forest = TRUE)
err_df <- ggRandomForests::gg_error(rf_iris, training = TRUE)
head(err_df)
<gg_error>  from randomForest  |  family: classification  |  ntree: 200  |  n: 150

A forest’s error rate settles down as trees are added, and the gg_error() object lets you watch that happen. It holds the cumulative out-of-bag (OOB) error rate for each outcome column, indexed by the ntree counter. Ask for training = TRUE and the function reconstructs the original model frame and adds the in-bag error trajectory (train) as well, so you can see both curves at once:

plot(err_df)

Marginal dependence via gg_variable()

set.seed(99)
boston <- MASS::Boston
rf_boston <- randomForest(medv ~ ., data = boston, ntree = 150)
var_df <- ggRandomForests::gg_variable(rf_boston)
str(var_df[, c("lstat", "yhat")])
Classes 'gg_variable', 'regression' and 'data.frame':   506 obs. of  2 variables:
 $ lstat: num  4.98 9.14 4.03 2.94 5.33 ...
 $ yhat : num  27.7 23.3 34.6 36.4 33.6 ...

gg_variable() recovers the training data straight from the model call, so it still works when the forest was fit inside a helper function or against a subset() expression, cases where the data is not sitting in the global environment. The object you get back keeps the raw predictors alongside the prediction: a single yhat column for regression, or one yhat.<class> column per class for classification. To plot one predictor, name it with xvar:

plot(var_df, xvar = "lstat")
`geom_smooth()` using method = 'loess' and formula = 'y ~ x'

Survival forests can request multiple horizons using the time argument; non-OOB predictions are available by setting oob = FALSE.

Variable importance with gg_vimp()

vimp_df <- ggRandomForests::gg_vimp(rf_boston)
head(vimp_df)
<gg_vimp>  from randomForest  |  family: regression  |  ntree: 150  |  n: 506  |  variables: 6
plot(vimp_df)

gg_vimp() measures permutation importance: each predictor is permuted in turn, and the drop in OOB accuracy gives its score. This contrasts with the gg_varpro family, which uses release-rule importance from the varPro engine. Variable importance is not always stored on the fitted object. If a randomForest fit is missing its importance scores, gg_vimp() will try to compute them for you. When even that is not possible (the forest was grown with importance = FALSE and the predictors are no longer reachable), the function warns and returns NA in place of the scores, so a plot still draws rather than failing outright.

Balanced conditioning cuts with quantile_pts()

rm_breaks <- ggRandomForests::quantile_pts(boston$rm, groups = 6, intervals = TRUE)
rm_groups <- cut(boston$rm, breaks = rm_breaks)
table(rm_groups)
rm_groups
(3.56,5.76] (5.76,5.99] (5.99,6.21] (6.21,6.44] (6.44,6.85] (6.85,8.78] 
         85          84          84          85          84          84 

When you build a coplot, you want each conditioning group to hold a roughly equal share of the data — equal-width bins leave the sparse tails nearly empty. quantile_pts() wraps stats::quantile() to give you break points that do exactly that, and they pass straight to cut() for the grouping or facet labels.

Next steps