The SuperLearner package is used to develop prediction models. In order to achieve the best performance of the algorithms, it is better to create a customized wrapper for different libraries. This section explains the steps to create a customized library for XGBoost package. XGBoost supports several hyperparameters to fine-tune the training process. Specifically for this package, there are two options to create a wrapper for XGBoost (or any other supported packages), including: - Making a wrapper for the current wrapper (SL.xgboost) - Creating a wrapper from scratch. In this note, we explain the first approach that is used in developing this package. The SuperLearner package explicitly supports some of the XGBoost hyperparameters. The following table explains these parameters:
SL | XGBoost | GPSmatching | Description |
---|---|---|---|
ntrees | nrounds | xgb_nrounds | Maximum number of boosting iteration |
shrinkage | eta | xgb_eta | Controls the learning rate [0,1]. Low eta value means the model is robust for overfitting; however, the computation is slow. |
max_depth | max_depth | xgb_max_depth | Maximum depth of tree |
minobspernode | min_child_weight | xgb_min_child_weight | minimum sum of instance weight (hessian) needed in a child. |
We use xgb_
prefix to distinguish different libraries’
hyperparameters. Users can pass the hyperparameters through the
param
list. Each hyperparameter can be a list of one or
many elements. At each iteration, the program randomly picks one element
out of the many provided for each hyperparameter. This process improves
the chance of developing a balanced pseudo population after several
trials. We would recommend providing a long list of hyperparameters to
have a better idea about the performance of the pseudo population
generating process. For reproducible research, use the one that provides
an acceptable answer.
In order to use the XGBoost package, users need to pass
m_xgboost
in the sl_lib
list. m
stands for the modified version. Internally for the XGBoost package, we
have only one library on memory (and global environment),
m_xgboost_internal
. Before conducting any processing that
involves developing prediction models (e.g., in estimate_gps and
gen_pseudo_pop functions), developers need to call the
gen_wrap_sl_lib
function. It will make sure that an updated
wrapper is generated and located in memory.