bullseye
is an R package which calculates measures of
association and other scores for pairs of variables in a dataset and
helps in visualising these measures in different layouts. The package
also calculates and visualises the pairwise measures for different
levels of a grouping variable.
This vignette gives an overview of how these pairwise variable measures are visualised. Calculation details are given in the accompanying vignette.
# install.packages("palmerpenguins")
library(bullseye)
library(dplyr)
library(ggplot2)
peng <-
rename(palmerpenguins::penguins,
bill_length=bill_length_mm,
bill_depth=bill_depth_mm,
flipper_length=flipper_length_mm,
body_mass=body_mass_g)
The usual starting point is the visualisation of a correlation of numeric variables:
If you wish to also include factor variables, use an alternative to
pair_cor
which accepts numeric and factor variables, eg
pair_cancor
. To see the available methods which handle all
variable types use
filter(pair_methods,nn&ff&fn)
#> # A tibble: 3 × 7
#> name nn ff fn from range ordinal
#> <chr> <lgl> <lgl> <lgl> <chr> <chr> <lgl>
#> 1 pair_ace TRUE TRUE TRUE acepack::ace [0,1] FALSE
#> 2 pair_cancor TRUE TRUE TRUE cancor [0,1] FALSE
#> 3 pair_nmi TRUE TRUE TRUE linkspotter::maxNMI [0,1] FALSE
Alternatively, if you wish to show different association measures for
correlation for numeric variables and cancor for non numeric, plot the
result of pairwise_scores
:
Adding interactive=TRUE
means tooltips are
available.
By default variables in this plot are re-ordered to emphasize pairs with maximum absolute scores. This re-ordering uses hierarchical clustering to place high score pairs adjacently, and also to push high score pairs to the top-left of the display.
The pairwise
structure has multiple association scores
when each (x,y) pair appears multiple times in the pairwise
structure.
The bullseye plot shown here has a pie wedge representing the conditional correlations. The overall or ungrouped correlation is shown in the pie center. As there are multiple scores for each (x,y) pair the ordering algorithm is based on the maximum of these scores.
An alternative ordering algorithm gives emphasis to pairs with the largest difference in the scores: