Make the most out of your models
modelbased is a package helping with model-based estimations, to easily compute of marginal means, contrast analysis and model predictions.
The modelbased package is available on CRAN, while its latest development version is available on R-universe (from rOpenSci).
Type | Source | Command |
---|---|---|
Release | CRAN | install.packages("modelbased") |
Development | R-universe | install.packages("modelbased", repos = "https://easystats.r-universe.dev") |
Once you have downloaded the package, you can then load it using:
library("modelbased")
[!TIP] Instead of
library(modelbased)
, uselibrary(easystats)
, which will make all features of the easystats-ecosystem available. You can also useeasystats::install_latest()
to stay updated.
Access the package documentation, and check-out these vignettes:
The core idea behind the modelbased package is that statistical models often contain a lot more insights than what you get from simply looking at the model parameters. In many cases, like models with multiple interactions, non-linear effects, non-standard families, complex random effect structures, the parameters can be hard to interpret. This is where the modelbased package comes in.
To give a very simply example, imagine that you are interested in the
effect of 3 conditions A, B and C on a
variable Y. A simple linear model Y ~ Condition
will give you 3 parameters: the intercept (the average value of
Y in condition A), and the relative effect of
condition B and C. But what you would like to also get
is the average value of Y in the other conditions too. Many
people will compute the average “by hand” (i.e., the empirical
average) by directly averaging their observed data in these groups.
But did you know that the estimated average (which can be much
more relevant, e.g., if you adjust for other variables in the model) is
contained in your model, and that you can get them easily by running
estimate_means()
?
The modelbased package is built around 4 main functions:
estimate_means()
:
Estimates the average values at each factor levelsestimate_contrasts()
:
Estimates and tests contrasts between different factor levelsestimate_slopes()
:
Estimates the slopes of numeric predictors at different factor levels or
alongside a numeric predictorestimate_prediction()
:
Make predictions using the modelThese functions are based on important statistical concepts, like data grids, predictions and marginal effects, and leverages other packages like emmeans and marginaleffects. We recommend reading about all of that to get a deeper understanding of the hidden power of your models.
Check-out the function documentation and this vignette for a detailed walkthrough on marginal means.
library(modelbased)
library(ggplot2)
# 1. The model
<- lm(Sepal.Width ~ Species, data = iris)
model
# 2. Obtain estimated means
<- estimate_means(model, by = "Species")
means
means## Estimated Marginal Means
##
## Species | Mean | SE | 95% CI | t(147)
## ------------------------------------------------
## setosa | 3.43 | 0.05 | [3.33, 3.52] | 71.36
## versicolor | 2.77 | 0.05 | [2.68, 2.86] | 57.66
## virginica | 2.97 | 0.05 | [2.88, 3.07] | 61.91
##
## Variable predicted: Sepal.Width
## Predictors modulated: Species
# 3. Custom plot
ggplot(iris, aes(x = Species, y = Sepal.Width)) +
# Add base data
geom_violin(aes(fill = Species), color = "white") +
geom_jitter(width = 0.1, height = 0, alpha = 0.5, size = 3) +
# Add pointrange and line for means
geom_line(data = means, aes(y = Mean, group = 1), linewidth = 1) +
geom_pointrange(
data = means,
aes(y = Mean, ymin = CI_low, ymax = CI_high),
size = 1,
color = "white"
+
) # Improve colors
scale_fill_manual(values = c("pink", "lightblue", "lightgreen")) +
theme_minimal()
Check-out this vignette for a detailed walkthrough on contrast analysis.
# 1. The model
<- lm(Sepal.Width ~ Species, data = iris)
model
# 2. Estimate marginal contrasts
<- estimate_contrasts(model, contrast = "Species")
contrasts
contrasts## Marginal Contrasts Analysis
##
## Level1 | Level2 | Difference | SE | 95% CI | t(147) | p
## ------------------------------------------------------------------------------
## versicolor | setosa | -0.66 | 0.07 | [-0.79, -0.52] | -9.69 | < .001
## virginica | setosa | -0.45 | 0.07 | [-0.59, -0.32] | -6.68 | < .001
## virginica | versicolor | 0.20 | 0.07 | [ 0.07, 0.34] | 3.00 | 0.003
##
## Variable predicted: Sepal.Width
## Predictors contrasted: Species
## p-values are uncorrected.
<- lm(Sepal.Width ~ Species * Petal.Length, data = iris)
model <- estimate_contrasts(
difference
model,contrast = "Species",
by = "Petal.Length",
length = 3
)# no line break for table
print(difference, table_width = Inf)
## Marginal Contrasts Analysis
##
## Level1 | Level2 | Petal.Length | Difference | SE | 95% CI | t(144) | p
## ---------------------------------------------------------------------------------------------
## versicolor | setosa | 1.00 | -1.70 | 0.34 | [-2.37, -1.02] | -4.97 | < .001
## virginica | setosa | 1.00 | -1.34 | 0.40 | [-2.13, -0.56] | -3.38 | < .001
## virginica | versicolor | 1.00 | 0.36 | 0.49 | [-0.61, 1.33] | 0.73 | 0.468
## versicolor | setosa | 3.95 | -1.74 | 0.65 | [-3.03, -0.45] | -2.67 | 0.008
## virginica | setosa | 3.95 | -1.79 | 0.66 | [-3.11, -0.48] | -2.70 | 0.008
## virginica | versicolor | 3.95 | -0.06 | 0.15 | [-0.35, 0.24] | -0.37 | 0.710
## versicolor | setosa | 6.90 | -1.78 | 1.44 | [-4.62, 1.06] | -1.24 | 0.218
## virginica | setosa | 6.90 | -2.25 | 1.42 | [-5.06, 0.56] | -1.58 | 0.116
## virginica | versicolor | 6.90 | -0.47 | 0.28 | [-1.03, 0.09] | -1.65 | 0.101
##
## Variable predicted: Sepal.Width
## Predictors contrasted: Species
## p-values are uncorrected.
# Recompute contrasts with a higher precision (for a smoother plot)
<- estimate_contrasts(
contrasts
model,contrast = "Species",
by = "Petal.Length",
length = 20,
# we use a emmeans here because marginaleffects doesn't
# generate more than 25 rows for pairwise comparisons
backend = "emmeans"
)
# Add Contrast column by concatenating
$Contrast <- paste(contrasts$Level1, "-", contrasts$Level2)
contrasts
# Plot
ggplot(contrasts, aes(x = Petal.Length, y = Difference, )) +
# Add line and CI band
geom_line(aes(color = Contrast)) +
geom_ribbon(aes(ymin = CI_low, ymax = CI_high, fill = Contrast), alpha = 0.2) +
# Add line at 0, indicating no difference
geom_hline(yintercept = 0, linetype = "dashed") +
# Colors
theme_modern()
Check-out this vignette for a detailed walkthrough on visualisation matrices.
# 1. Fit model and get visualization matrix
<- lm(Sepal.Length ~ Petal.Length * Petal.Width, data = iris)
model
# 2. Create a visualisation matrix with expected Z-score values of Petal.Width
<- insight::get_datagrid(model, by = c("Petal.Length", "Petal.Width = c(-1, 0, 1)"))
vizdata
# 3. Revert from expected SD to actual values
<- unstandardize(vizdata, select = "Petal.Width")
vizdata
# 4. Add predicted relationship from the model
<- modelbased::estimate_expectation(vizdata)
vizdata
# 5. Express Petal.Width as z-score ("-1 SD", "+2 SD", etc.)
$Petal.Width <- effectsize::format_standardize(vizdata$Petal.Width, reference = iris$Petal.Width)
vizdata
# 6. Plot
ggplot(iris, aes(x = Petal.Length, y = Sepal.Length)) +
# Add points from original dataset (only shapes 21-25 have a fill aesthetic)
geom_point(aes(fill = Petal.Width), size = 5, shape = 21) +
# Add relationship lines
geom_line(data = vizdata, aes(y = Predicted, color = Petal.Width), linewidth = 1) +
# Improve colors / themes
scale_color_viridis_d(direction = -1) +
scale_fill_viridis_c(guide = "none") +
theme_minimal()
Check-out this vignette for a detailed walkthrough on predictions.
# Fit model 1 and predict the response variable
<- lm(Petal.Length ~ Sepal.Length, data = iris)
model1 <- estimate_expectation(model1)
pred1 $Petal.Length <- iris$Petal.Length # Add true response
pred1
# Print first 5 lines of output
head(pred1, n = 5)
## Model-based Predictions
##
## Sepal.Length | Predicted | SE | 95% CI | Residuals | Petal.Length
## -------------------------------------------------------------------------
## 5.10 | 2.38 | 0.10 | [2.19, 2.57] | -0.98 | 1.40
## 4.90 | 2.00 | 0.11 | [1.79, 2.22] | -0.60 | 1.40
## 4.70 | 1.63 | 0.12 | [1.39, 1.87] | -0.33 | 1.30
## 4.60 | 1.45 | 0.13 | [1.19, 1.70] | 0.05 | 1.50
## 5.00 | 2.19 | 0.10 | [1.99, 2.39] | -0.79 | 1.40
##
## Variable predicted: Petal.Length
# Same for model 2
<- lm(Petal.Length ~ Sepal.Length * Species, data = iris)
model2 <- estimate_expectation(model2)
pred2 $Petal.Length <- iris$Petal.Length
pred2
# Initialize plot for model 1
ggplot(data = pred1, aes(x = Petal.Length, y = Predicted)) +
# with identity line (diagonal) representing perfect predictions
geom_abline(linetype = "dashed") +
# Add the actual predicted points of the models
geom_point(aes(color = "Model 1")) +
geom_point(data = pred2, aes(color = "Model 2")) +
# Aesthetics changes
labs(y = "Petal.Length (predicted)", color = NULL) +
scale_color_manual(values = c("Model 1" = "blue", "Model 2" = "red")) +
theme_modern()
estimate_grouplevel
on a mixed model.See this vignette for more information.
library(lme4)
<- lmer(mpg ~ drat + (1 + drat | cyl), data = mtcars)
model
<- estimate_grouplevel(model)
random
random## Group | Level | Parameter | Coefficient | SE | 95% CI
## -----------------------------------------------------------------
## cyl | 4 | (Intercept) | -3.45 | 0.56 | [-4.55, -2.36]
## cyl | 4 | drat | 2.24 | 0.36 | [ 1.53, 2.95]
## cyl | 6 | (Intercept) | 0.13 | 0.84 | [-1.52, 1.78]
## cyl | 6 | drat | -0.09 | 0.54 | [-1.15, 0.98]
## cyl | 8 | (Intercept) | 3.32 | 0.73 | [ 1.89, 4.74]
## cyl | 8 | drat | -2.15 | 0.47 | [-3.07, -1.23]
plot(random) +
geom_hline(yintercept = 0, linetype = "dashed") +
theme_minimal()
estimate_slopes
.The two plots below represent the modeled (non-linear) effect estimated by the model, i.e., the relationship between the outcome and the predictor, as well as the “trend” (or slope) of that relationship at any given point. You can see that whenever the slope is negative, the effect is below 0, and vice versa, with some regions of the effect being significant (i.e., positive or negative with enough confidence) while the others denote regions where the relationship is rather flat.
Check-out this vignette for a detailed walkthrough on marginal effects.
library(patchwork)
# Fit a non-linear General Additive Model (GAM)
<- mgcv::gam(Sepal.Width ~ s(Petal.Length), data = iris)
model
# 1. Compute derivatives
<- estimate_slopes(model,
deriv trend = "Petal.Length",
by = "Petal.Length",
length = 200
)
# 2. Visualize predictions and derivative
plot(estimate_relation(model, length = 200)) /
plot(deriv) +
geom_hline(yintercept = 0, linetype = "dashed")
describe_nonlinear()
on a predicted relationship that will
return the different parts of increase and decrease.<- lm(Sepal.Width ~ poly(Petal.Length, 2), data = iris)
model
# 1. Visualize
<- estimate_relation(model, length = 30)
vizdata
ggplot(vizdata, aes(x = Petal.Length, y = Predicted)) +
geom_ribbon(aes(ymin = CI_low, ymax = CI_high), alpha = 0.3) +
geom_line() +
# Add original data points
geom_point(data = iris, aes(x = Petal.Length, y = Sepal.Width)) +
# Aesthetics
theme_modern()
# 2. Describe smooth line
describe_nonlinear(vizdata, x = "Petal.Length")
## Start | End | Length | Change | Slope | R2
## ---------------------------------------------
## 1.00 | 4.05 | 0.50 | -0.84 | -0.28 | 0.05
## 4.05 | 6.90 | 0.47 | 0.66 | 0.23 | 0.05
See this vignette for a walkthrough on how to do that.
Also referred to as Johnson-Neyman intervals, this plot shows how the effect (the “slope”) of one variable varies depending on another variable. It is useful in the case of complex interactions between continuous variables.
For instance, the plot below shows that the effect of hp
(the y-axis) is significantly negative only when wt
is low
(< ~4
).
<- lm(mpg ~ hp * wt, data = mtcars)
model
<- estimate_slopes(model, trend = "hp", by = "wt", length = 200)
slopes
plot(slopes)
Aside from plotting the coefficient of each random effect (as done here),
we can also visualize the predictions of the model for each of these
levels, which can be useful to diagnostic or see how they contribute to
the fixed effects. We will do that by making predictions with
estimate_relation()
and setting include_random
to TRUE
.
Let’s model the reaction time with the number of days of sleep deprivation as fixed effect and the participants as random intercept.
library(lme4)
<- lmer(Reaction ~ Days + (1 | Subject), data = sleepstudy)
model
<- estimate_relation(model, include_random = TRUE)
preds
plot(preds, ribbon = list(alpha = 0)) # Make CI ribbon transparent for clarity
As we can see, each participant has a different “intercept” (starting point on the y-axis), but all their slopes are the same: this is because the only slope is the “general” one estimated across all participants by the fixed effect. Let’s address that and allow the slope to vary for each participant too.
<- lmer(Reaction ~ Days + (1 + Days | Subject), data = sleepstudy)
model
<- estimate_relation(model, include_random = TRUE)
preds
plot(preds, ribbon = list(alpha = 0.1))
As we can see, the effect is now different for all participants. Let’s plot, on top of that, the “fixed” effect estimated across all these individual effects.
<- estimate_relation(model) # This time, include_random is FALSE (default)
fixed_pred
plot(preds, ribbon = list(alpha = 0)) + # Previous plot
geom_ribbon(data = fixed_pred, aes(x = Days, ymin = CI_low, ymax = CI_high), alpha = 0.4) +
geom_line(data = fixed_pred, aes(x = Days, y = Predicted), linewidth = 2)
Please note that the modelbased project is released with a Contributor Code of Conduct. By contributing to this project, you agree to abide by its terms.