The FBMS
package provides functions for Flexible Bayesian Model Selection and Model Averaging.
Below are provided examples on how to run the algorithm and what the results tell us, we begin by loading the package and a supplied dataset
library(FBMS)
#> Loading required package: fastglm
#> Loading required package: bigmemory
#> Loading required package: GenSA
#> Loading required package: parallel
library(GenSA)
library(fastglm)
data("breastcancer")
<- breastcancer[,c(ncol(breastcancer),2:(ncol(breastcancer)-1))] bc
We need some nonlinear transformations for the algorithm to use, the package offers a selection of these, but you are also able to define your own. Here we create a list of the ones to use, all but one are supplied by the package.
<- function(x) x^3
to3
<- c("sigmoid","sin_deg","exp_dbl","p0","troot","to3") transforms
By calling two functions in the package, a list of probabilities for various parts of the algorithm, as well as a list of parameters are created. The list of probabilities needs the list of transformations to be able to create the vector of probabilities for the different transformations
<- gen.probs.gmjmcmc(transforms)
probs <- gen.params.gmjmcmc(bc) params
We can use one of the supplied likelihood functions, but here we demonstrate how to create our own, it takes four arguments, the dependent \(y\) variable, the matrix \(X\) containing all independent variables, the model as a logical vector specifying the columns of \(X\), and a list of complexity measures for the features involved in the model
<- function (y, x, model, complex, params) {
loglik.example <- 20/223
r suppressWarnings({mod <- fastglm(as.matrix(x[,model]), y, family=binomial())})
<- (-(mod$deviance -2*log(r)*sum(complex$width)))/2
ret return(list(crit=ret, coefs=mod$coefficients))
}
To be able to calculate the alphas when using for example strategy 3 as per Hubin et al., we need a function for the log likelihood, in this example we will use the function supplied by the package called logistic.loglik.alpha
. With that function as a starting point, you can also create your own function for this. We also adjust our parameter list to use the third strategy.
$feat$alpha <- 3 params
We are now ready to run the algorithm, in this vignette we will only run very few iterations for demonstration purposes, but the only thing that needs to be changed are the number or populations to visit T
, the number of iterations per population N
and the number of iterations for the final population N.final
set.seed(1234)
<- gmjmcmc(bc, loglik.example, logistic.loglik.alpha, transforms, P=3, N=30, N.final=60, probs, params)
result #> Data coerced to matrix type.
#> Intercept added to data.
#> Population 1 begin.
: -33.7601529708647
New best crit#>
: -31.3487134729661
New best crit#>
: -30.9876784635977
New best crit#>
: -28.9372739751026
New best crit#>
: -26.5258344770878
New best crit#>
: -24.1143949791927
New best crit#>
: -21.7029554813142
New best crit#>
: -19.2915159834322
New best crit#>
: -16.8800764855705
New best crit#>
: -16.8800764855416
New best crit#>
: -16.8800764855347
New best crit#>
#> Population 1 done.
#>
: -16.8800764855347
Current best crit#> Feature importance:
#> ! | radius_mean
#> ###########!#################| texture_mean
#> ###########!#################| perimeter_mean
#> ! | area_mean
#> ! | smoothness_mean
#> ###########!#################| compactness_mean
#> ! | concavity_mean
#> ! | concave.points_mean
#> ! ##| symmetry_mean
#> ! | fractal_dimension_mean
#> ! | radius_se
#> ! | texture_se
#> ! ###| perimeter_se
#> ###########!#################| area_se
#> ! | smoothness_se
#> ! | compactness_se
#> ! | concavity_se
#> ! | concave.points_se
#> ! | symmetry_se
#> ! | fractal_dimension_se
#> ! | radius_worst
#> ! | texture_worst
#> ! | perimeter_worst
#> ! | area_worst
#> ! | smoothness_worst
#> ! | compactness_worst
#> ! ##| concavity_worst
#> ############!#################| concave.points_worst
#> #########!#################| symmetry_worst
#> ! | fractal_dimension_worst
#> Replaced feature concavity_mean with (compactness_mean*compactness_mean)
#> Replaced feature texture_se with (concavity_mean*area_worst)
#> Replaced feature symmetry_se with exp_dbl(symmetry_se)
#> Replaced feature radius_worst with to3(symmetry_worst)
#> Replaced feature smoothness_worst with sin_deg(-146+12.84*concave.points_worst+-7.51*fractal_dimension_worst+-25.51*symmetry_worst+0.19*area_mean+23.98*concavity_mean+1.96*texture_se)
#> Replaced feature concavity_worst with exp_dbl(smoothness_mean)
#> Added feature p0(2.11+-0.38*symmetry_worst+-0.07*texture_mean+-2.33*concave.points_worst+-0.05*exp_dbl(smoothness_mean)+-0.82*fractal_dimension_mean+-1.28*compactness_mean+-3.59*compactness_worst)
#> Added feature (perimeter_mean*compactness_mean)
#> Added feature (area_worst*area_se)
#> Added feature (sin_deg(350.74+-554.06*concave.points_worst+-531.58*fractal_dimension_worst+-356.89*symmetry_worst+-489.51*area_mean+-408.97*concavity_mean+-318.42*texture_se)*compactness_mean)
#> Added feature ((area_worst*area_se)*to3(symmetry_worst))
#> Added feature (concave.points_worst*symmetry_worst)
#> Added feature p0(symmetry_worst)
#> Added feature (smoothness_worst*concave.points_worst)
#> Added feature (smoothness_se*area_se)
#> Added feature sigmoid(compactness_mean)
#> Added feature to3(perimeter_mean)
#> Added feature (concavity_mean*symmetry_worst)
#> Added feature troot(area_se)
#> Population 2 begin.
: -96.4575799165155
New best crit#>
: -94.0461404185737
New best crit#>
: -47.9676443754457
New best crit#>
: -43.2993839991298
New best crit#>
: -41.9021668179924
New best crit#>
: -37.1878156666056
New best crit#>
: -36.1725870809054
New best crit#>
: -32.3687735695188
New best crit#>
: -31.3377227748698
New best crit#>
#> Population 2 done.
#>
: -31.3377227748698
Current best crit#> Feature importance:
#> ! #| radius_mean
#> ! | texture_mean
#> ###########!#################| perimeter_mean
#> ! #| area_mean
#> ! | smoothness_mean
#> ! | compactness_mean
#> ! | (compactness_mean*compactness_mean)
#> ###########!#################| concave.points_mean
#> ! #| symmetry_mean
#> ###########!#################| fractal_dimension_mean
#> ! #| radius_se
#> ! | (concavity_mean*area_worst)
#> ###########!#################| perimeter_se
#> ! #| area_se
#> ! #| smoothness_se
#> ! | compactness_se
#> ! | concavity_se
#> ! | concave.points_se
#> ! | exp_dbl(symmetry_se)
#> ###########!#################| fractal_dimension_se
#> ! | to3(symmetry_worst)
#> ! | texture_worst
#> ! #| perimeter_worst
#> ###########!#################| area_worst
#> ! | sin_deg(-146+12.84*concave.points_worst+-7.51*fractal_dimension_worst+-25.51*symmetry_worst+0.19*area_mean+23.98*concavity_mean+1.96*texture_se)
#> ###########!#################| compactness_worst
#> ! | exp_dbl(smoothness_mean)
#> ! #| concave.points_worst
#> ! | symmetry_worst
#> ! | fractal_dimension_worst
#> ! | p0(2.11+-0.38*symmetry_worst+-0.07*texture_mean+-2.33*concave.points_worst+-0.05*exp_dbl(smoothness_mean)+-0.82*fractal_dimension_mean+-1.28*compactness_mean+-3.59*compactness_worst)
#> ! | (perimeter_mean*compactness_mean)
#> ! | (area_worst*area_se)
#> ! | (sin_deg(350.74+-554.06*concave.points_worst+-531.58*fractal_dimension_worst+-356.89*symmetry_worst+-489.51*area_mean+-408.97*concavity_mean+-318.42*texture_se)*compactness_mean)
#> ! | ((area_worst*area_se)*to3(symmetry_worst))
#> ! | (concave.points_worst*symmetry_worst)
#> ###########!#################| p0(symmetry_worst)
#> ! | (smoothness_worst*concave.points_worst)
#> ! | (smoothness_se*area_se)
#> ! | sigmoid(compactness_mean)
#> ! | to3(perimeter_mean)
#> ! | (concavity_mean*symmetry_worst)
#> ! | troot(area_se)
#> Replaced feature symmetry_mean with to3(-2.39+3.25*concave.points_mean+0*area_worst)
#> Replaced feature compactness_se with p0((compactness_mean*compactness_mean))
#> Replaced feature concavity_se with to3(perimeter_se)
#> Replaced feature to3(symmetry_worst) with (perimeter_se*fractal_dimension_mean)
#> Replaced feature sin_deg(-146+12.84*concave.points_worst+-7.51*fractal_dimension_worst+-25.51*symmetry_worst+0.19*area_mean+23.98*concavity_mean+1.96*texture_se) with troot(4.11+-0.03*perimeter_mean+1.25*p0(symmetry_worst)+2.87*fractal_dimension_mean)
#> Replaced feature (sin_deg(350.74+-554.06*concave.points_worst+-531.58*fractal_dimension_worst+-356.89*symmetry_worst+-489.51*area_mean+-408.97*concavity_mean+-318.42*texture_se)*compactness_mean) with (p0(symmetry_worst)*compactness_se)
#> Replaced feature sigmoid(compactness_mean) with p0(2.46+-4.19*compactness_se+-0.08*perimeter_mean+0.08*perimeter_se+0.01*area_worst)
#> Replaced feature to3(perimeter_mean) with (p0(symmetry_worst)*area_mean)
#> Replaced feature (concavity_mean*symmetry_worst) with radius_worst
#> Population 3 begin.
|= |
|== |
: -200.149478326352
New best crit#>
|=== |
: -195.326599330544
New best crit#>
|==== |
: -183.269401841014
New best crit#>
|===== |
: -36.1715924687893
New best crit#>
|====== |
|======= |
|======== |
|========= |
: -26.5258344771775
New best crit#>
|========== |
|=========== |
|============ |
|============= |
: -21.7029554815947
New best crit#>
|============== |
: -19.2915159836881
New best crit#>
|=============== |
|================ |
|================= |
|================== |
|=================== |
|==================== |
|===================== |
|====================== |
|======================= |
|======================== |
|========================= |
|========================== |
|=========================== |
|============================ |
|============================= |
|============================== |
|=============================== |
|================================ |
|================================= |
|================================== |
|=================================== |
|==================================== |
|===================================== |
|====================================== |
|======================================= |
|========================================|
|========================================|
|========================================|
|========================================|
|========================================|
|========================================|
|========================================|
|========================================|
|========================================|
|========================================|
|========================================|
|========================================|
|========================================|
|========================================|
|========================================|
|========================================|
|========================================|
|========================================|
|========================================|
|========================================|
|========================================|
#> Population 3 done.
#>
: -19.2915159836881
Current best crit#> Feature importance:
#> ! | radius_mean
#> ! | texture_mean
#> ! | perimeter_mean
#> ! | area_mean
#> ! | smoothness_mean
#> ! | compactness_mean
#> ! | (compactness_mean*compactness_mean)
#> ! | concave.points_mean
#> ! | to3(-2.39+3.25*concave.points_mean+0*area_worst)
#> ! | fractal_dimension_mean
#> ! ##| radius_se
#> ! | (concavity_mean*area_worst)
#> ! | perimeter_se
#> ! | area_se
#> ! | smoothness_se
#> ! | p0((compactness_mean*compactness_mean))
#> ! | to3(perimeter_se)
#> ! | concave.points_se
#> ! | exp_dbl(symmetry_se)
#> ! | fractal_dimension_se
#> ! | (perimeter_se*fractal_dimension_mean)
#> ###########!#################| texture_worst
#> ! | perimeter_worst
#> ! | area_worst
#> ! | troot(4.11+-0.03*perimeter_mean+1.25*p0(symmetry_worst)+2.87*fractal_dimension_mean)
#> ! | compactness_worst
#> ! | exp_dbl(smoothness_mean)
#> ! | concave.points_worst
#> ###########!#################| symmetry_worst
#> ! | fractal_dimension_worst
#> ! | p0(2.11+-0.38*symmetry_worst+-0.07*texture_mean+-2.33*concave.points_worst+-0.05*exp_dbl(smoothness_mean)+-0.82*fractal_dimension_mean+-1.28*compactness_mean+-3.59*compactness_worst)
#> ! | (perimeter_mean*compactness_mean)
#> ! | (area_worst*area_se)
#> ! | (p0(symmetry_worst)*compactness_se)
#> ! | ((area_worst*area_se)*to3(symmetry_worst))
#> ###########!#################| (concave.points_worst*symmetry_worst)
#> ! | p0(symmetry_worst)
#> ! | (smoothness_worst*concave.points_worst)
#> ! | (smoothness_se*area_se)
#> ! | p0(2.46+-4.19*compactness_se+-0.08*perimeter_mean+0.08*perimeter_se+0.01*area_worst)
#> ! | (p0(symmetry_worst)*area_mean)
#> ! | radius_worst
#> ###########!#################| troot(area_se)
We can then summarize the results using the supplied function and get a plot of the importance of the parameters in the last population of features
plot(result)