TL;DR, you can jump straight into the visuals and
application with cheem::run_app()
, but we suggest you read
the introduction to get situated with the context first.
Non-linear models regularly result in more accurate prediction than their linear counterparts. However, the number and complexity of their terms make them more opaque to the interpretability. The our ability to understand how features (variables or predictors) influence predictions is important to a wide range of audiences. Attempts to bring interpretability to such complex models is an important aspect of eXplainable Artificial Intelligence (XAI).
Local explanations are one such tool used in XAI. They attempt to approximate the feature importance in the vicinity of one instance (observation). That is to say that they give an approximation of linear terms at the position of one in-sample or out-of-sample observation.
If the analyst can explore how models lead to bad predictions it can suggest insight into issues of the data or suggest models that may be more robust to misclassified or extreme residuals. An analyst may want to explore the support feature contributions where the explanations makes sense or may be completely unreliable. We purpose this sort of analysis as conducted with interactive graphics in the analysis and R package titled cheem.
This framework is broadly applicable for any model and compatible local explanation. We will illustrate with xgboost::xgboost() model (xgb) and the tree SHAP local explanation with shapviz::shapviz(). The model attempts to predict housing sales price from 11 predictors for 338 sale events from one neighborhood in the 2018 Ames data.
The first things we need are the prediction and a local explanation (or other embedded space). Here we create a xgb model, create predictions, and find the SHAP values of each observation.
## Download if not installed
if(!require(cheem)) install.packages("cheem", dependencies = TRUE)
if(!require(treeshap)) install.packages("treeshap", dependencies = TRUE)
if(!require(shapviz)) install.packages("shapviz", dependencies = TRUE)
## Load onto session
library(cheem)
library(xgboost)
library(shapviz)
## Setup
<- amesHousing2018_NorthAmes[, 1:9]
X <- amesHousing2018_NorthAmes$SalePrice
Y <- amesHousing2018_NorthAmes$SubclassMS
clas
## Model and predict
<- data.matrix(X) %>% xgb.DMatrix(label = Y)
ames_train <- xgboost(data = ames_train, max.depth = 3, nrounds = 25)
ames_xgb_fit <- predict(ames_xgb_fit, newdata = ames_train)
ames_xgb_pred %>% head()
ames_xgb_pred
## SHAP values
<- shapviz(ames_xgb_fit, X_pred = ames_train, X = X)
shp ## Keep just the [n, p] local explanations
<- shp$S
ames_xgb_shap %>% head() ames_xgb_shap
Note that the choice of the model, prediction, and local explanation
(or other embedding) is choice of the analyst and not facilitated by
cheem. Now let’s prepare for the visualization of these
spaces with a cheem::cheem_ls()
call before we start our
analysis.
## Preprocessing for cheem analysis
<- cheem_ls(X, Y,
ames_chm class = clas,
attr_df = ames_xgb_shap,
pred = ames_xgb_pred,
label = "Ames, xgb, shap")
names(ames_chm)
We have extracted tree SHAP, an feature importance measure in the vicinity of each observation. We need to identify an instance of interest to explore; we do so with the linked brushing available in the global view. Then we will vary contributions from different features to test the support an explanation in a radial tour
To get more complete view lets look at approximations of the data space, attribution space, and model fits side-by-side with linked brushing with the help of plotly and crosstalk. We have identified an observation with a large Mahalanobis distance (in data space) and the closest neighbor in attribution space.
<- 1
prim <- 17
comp global_view(ames_chm, primary_obs = prim, comparison_obs = comp,
height_px = 240, width_px = 720,
as_ggplot = TRUE, color = "log_maha.data")
From this global view we want to identify a primary instance (PI) and optionally a comparison instance (CI) to explore. Misclassified or observations with high residuals are good targets for further exploration. One point sticks out in this case. Instance 243 (shown as *) is a Gentoo (purple) penguin, while the model predict it to be a Chinstrap penguin. Penguin 169 (shown as x) is reasonably close by and correctly predicted as Gentoo. In practice we used linked brushing and misclassification information to guide our search.
There is a lot to unpack here. The normalized distribution of all
feature attribution from all instances are shown as parallel coordinates
lines. The above selected PI and CI are shown here as a dashed and
dotted line respectively. The first thing we notice is that the
attribution of the PI is close to it’s (incorrect) prediction of
Chinstrap (orange) in terms of bill length (bl
) and flipper
length (fl
). In terms of bill depth and body mass
(bd
and bm
) it is more like its observed
species Gentoo (purple). We select flipper length as the feature to
manipulate.
## Normalized attribution basis of the PI
<- sug_basis(ames_xgb_shap, rownum = prim)
bas ## Default feature to manipulate:
#### the feature with largest separation between PI and CI attribution
<- sug_manip_var(
mv primary_obs = prim, comparison_obs = comp)
ames_xgb_shap, ## Make the radial tour
<- radial_cheem_tour(
ggt basis = bas, manip_var = mv,
ames_chm, primary_obs = prim, comparison_obs = comp, angle = .15)
## Animate it
animate_gganimate(ggt, fps = 6)
#height = 2, width = 4.5, units = "in", res = 150
## Or as a plotly html widget
#animate_plotly(ggt, fps = 6)
Starting from the attribution projection, this instance already looks more like its observed Gentoo than predicted Chinstrap. However, by frame 8, the basis has a full contribution of flipper length and does look more like the predicted Chinstrap. Looking at the parallel coordinate lines on the basis visual we can see that flipper length has a large gap between PI and CI, lets check the original variables to digest.
library(ggplot2)
<- 1
prim
ggplot(penguins_na.rm, aes(x = bill_length_mm,
y = flipper_length_mm,
colour = species,
shape = species)) +
geom_point() +
## Highlight PI, *
geom_point(data = penguins_na.rm[prim, ],
shape = 8, size = 5, alpha = 0.8) +
## Theme, scaling, color, and labels
theme_bw() +
theme(aspect.ratio = 1) +
scale_color_brewer(palette = "Dark2") +
labs(y = "Flipper length [mm]", x = "Bill length [mm]",
color = "Observed species", shape = "Observed species")
This profile, with two features that are most distinguished between the PI and CI. This instance is nested in the in between the Chinstrap penguins. That makes this instance particularly hard for a random forest model to classify as decision tree can only make partition on one value (horizontal and vertical lines here).
We provide an interactive shiny application.
Interactive features are made possible with plotly,
crosstalk, and DT. We have
preprocessed simulated and modern datasets for you to explore this
analysis with. Alternatively, bring your own data by saving the return
of cheem_ls()
as an rds file. Follow along with the example
in ?cheem_ls
.
Interpretability of black-box models is important to maintain. Local explanation extend this interpretability by approximating the feature importance in the vicinity of one instance. We purpose post-hoc analysis of these local explanations. First we explore them in a global, full instance context. Then we explore the support of the local explanation to see where it seems plausible or unreliable.
cheem is agnostic to model or local explanation, but requires a model and local explanation. Above we illustrated using a random forest to predict penguin species. Below demonstrates using other attribution spaces from different models.
shapviz is being actively maintained and is hosted on CRAN. It is compatible with H2O, lgb, and xgb models.
https://github.com/ModelOriented/shapviz
if(!require(shapviz)) install.packages("shapviz")
if(!require(xgboost)) install.packages("xgboost")
library(shapviz)
library(xgboost)
set.seed(3653)
## Setup
<- spinifex::penguins_na.rm[, 1:4]
X <- spinifex::penguins_na.rm$species
Y <- spinifex::penguins_na.rm$species
clas
## Model and predict
<- data.matrix(X) %>%
peng_train xgb.DMatrix(label = Y)
<- xgboost(data = peng_train, max.depth = 3, nrounds = 25)
peng_xgb_fit <- predict(peng_xgb_fit, newdata = peng_train)
peng_xgb_pred
## SHAP
<- shapviz(peng_xgb_fit, X_pred = peng_train, X = X)
peng_xgb_shap ## Keep just the [n, p] local explanations
<- peng_xgb_shap$S peng_xgb_shap
treeshap is only available on CRAN. It is compatible with many tree-based models including gbm, lbm, rf, ranger, and xgb models.
https://github.com/ModelOriented/treeshap
if(!require(treeshap)) install.packages("treeshap")
if(!require(randomForest)) install.packages("randomForest")
library(treeshap)
library(randomForest)
## Setup
<- spinifex::wine[, -1:2]
X <- spinifex::wine$Alcohol
Y <- spinifex::wine$Type
clas
## Fit randomForest::randomForest
<- randomForest::randomForest(
wine_rf_fit ntree = 125,
X, Y, mtry = ifelse(is_discrete(Y), sqrt(ncol(X)), ncol(X) / 3),
nodesize = max(ifelse(is_discrete(Y), 1, 5), nrow(X) / 500))
<- predict(wine_rf_fit)
wine_rf_pred
## treeshap::treeshap()
<- wine_rf_fit %>%
wine_rf_tshap ::randomForest.unify(X) %>%
treeshap::treeshap(X, interactions = FALSE, verbose = FALSE)
treeshap## Keep just the [n, p] local explanations
<- wine_rf_tshap$shaps wine_rf_tshap
DALEX is a popular and versatile XAI package available on CRAN. It is compatible with many models, but it uses the original, slower variant of SHAP local explanation. Expect long run times for sizable data or complex models.
https://ema.drwhy.ai/shapley.html#SHAPRcode
if(!require(DALEX)) install.packages("DALEX")
library(DALEX)
## Setup
<- dragons[, c(1:4, 6)]
X <- dragons$life_length
Y <- dragons$colour
clas
## Model and predict
<- lm(data = data.frame(Y, X), Y ~ .)
drag_lm_fit <- predict(drag_lm_fit)
drag_lm_pred
## SHAP via DALEX, versatile but slow
<- explain(drag_lm_fit, data = X, y = Y,
drag_lm_exp label = "Dragons, LM, SHAP")
## DALEX::predict_parts_shap is flexible, but slow and one row at a time
<- matrix(NA, nrow(X), ncol(X))
drag_lm_shap sapply(1:nrow(X), function(i){
<- predict_parts_shap(drag_lm_exp, new_observation = X[i, ])
pps ## Keep just the [n, p] local explanations
<<- tapply(
drag_lm_shap[i, ] $contribution, pps$variable, mean, na.rm = TRUE) %>% as.vector()
pps
})<- as.data.frame(drag_lm_shap) drag_lm_shap