Visualising pairwise scores using bullseye.

bullseye is an R package which calculates measures of association and other scores for pairs of variables in a dataset and helps in visualising these measures in different layouts. The package also calculates and visualises the pairwise measures for different levels of a grouping variable.

This vignette gives an overview of how these pairwise variable measures are visualised. Calculation details are given in the accompanying vignette.

# install.packages("palmerpenguins")

library(bullseye)
library(dplyr)
library(ggplot2)
peng <-
  rename(palmerpenguins::penguins, 
           bill_length=bill_length_mm,
           bill_depth=bill_depth_mm,
           flipper_length=flipper_length_mm,
           body_mass=body_mass_g)

Visualising associations

The usual starting point is the visualisation of a correlation of numeric variables:

plot(pair_cor(peng))

If you wish to also include factor variables, use an alternative to pair_cor which accepts numeric and factor variables, eg pair_cancor. To see the available methods which handle all variable types use

filter(pair_methods,nn&ff&fn)
#> # A tibble: 3 × 7
#>   name        nn    ff    fn    from                range ordinal
#>   <chr>       <lgl> <lgl> <lgl> <chr>               <chr> <lgl>  
#> 1 pair_ace    TRUE  TRUE  TRUE  acepack::ace        [0,1] FALSE  
#> 2 pair_cancor TRUE  TRUE  TRUE  cancor              [0,1] FALSE  
#> 3 pair_nmi    TRUE  TRUE  TRUE  linkspotter::maxNMI [0,1] FALSE

Alternatively, if you wish to show different association measures for correlation for numeric variables and cancor for non numeric, plot the result of pairwise_scores:

plot(pairwise_scores(peng), interactive=TRUE)

Adding interactive=TRUE means tooltips are available.

By default variables in this plot are re-ordered to emphasize pairs with maximum absolute scores. This re-ordering uses hierarchical clustering to place high score pairs adjacently, and also to push high score pairs to the top-left of the display.

Visualising multiple scores

The pairwise structure has multiple association scores when each (x,y) pair appears multiple times in the pairwise structure.

scores <- pairwise_scores(peng, by="species")
plot(scores, interactive=TRUE) 

The bullseye plot shown here has a pie wedge representing the conditional correlations. The overall or ungrouped correlation is shown in the pie center. As there are multiple scores for each (x,y) pair the ordering algorithm is based on the maximum of these scores.

An alternative ordering algorithm gives emphasis to pairs with the largest difference in the scores:

plot(scores, var_order="seriate_max_diff", interactive=TRUE)