| Type: | Package |
| Title: | Guarded Resampling Workflows for Safe and Automated Machine Learning in R |
| Version: | 0.7.7 |
| Description: | Provides a guarded resampling workflow for training and evaluating machine-learning models. When the guarded resampling path is used, preprocessing and model fitting are re-estimated within each resampling split to reduce leakage risk. Supports multiple resampling schemes, integrates with established engines in the 'tidymodels' ecosystem, and aims to improve evaluation reliability by coordinating preprocessing, fitting, and evaluation within supported workflows. Offers a lightweight AutoML-style workflow by automating model training, resampling, and tuning across multiple algorithms, while keeping evaluation design explicit and user-controlled. |
| Encoding: | UTF-8 |
| License: | MIT + file LICENSE |
| URL: | https://selcukorkmaz.github.io/fastml-tutorial/, https://github.com/selcukorkmaz/fastml |
| BugReports: | https://github.com/selcukorkmaz/fastml/issues |
| Depends: | R (≥ 4.1.0) |
| Imports: | stats, recipes, dplyr, ggplot2, reshape2, rsample, parsnip, tune, workflows, yardstick, tibble, rlang, dials, RColorBrewer, baguette, discrim, doFuture, foreach, finetune, future, plsmod, probably, viridisLite, DALEX, magrittr, pROC, janitor, stringr, broom, tidyr, purrr, survival, flexsurv, rstpm2, iml, lime, survRM2, iBreakDown, xgboost, pdp, modelStudio, fairmodels |
| Suggests: | testthat (≥ 3.0.0), withr, C50, ranger, aorsf, censored, crayon, kernlab, klaR, kknn, keras, lightgbm, rstanarm, mixOmics, patchwork, GGally, glmnet, themis, DT, UpSetR, VIM, dbscan, ggpubr, gridExtra, htmlwidgets, kableExtra, moments, naniar, plotly, scales, skimr, sparsediscrim, knitr, rmarkdown, pec |
| RoxygenNote: | 7.3.3 |
| Config/testthat/edition: | 3 |
| NeedsCompilation: | no |
| Packaged: | 2026-01-27 22:12:54 UTC; selcuk |
| Author: | Selcuk Korkmaz |
| Maintainer: | Selcuk Korkmaz <selcukorkmaz@gmail.com> |
| Repository: | CRAN |
| Date/Publication: | 2026-01-27 22:50:18 UTC |
Environment for Tracking Warned Defaults
Description
Internal environment to track which default warnings have been shown in the current session to avoid duplicate warnings.
Usage
.fastml_warned_defaults
Format
An object of class environment of length 0.
Align Survival Curve to Evaluation Times
Description
Aligns a survival curve (defined by time points and survival probabilities)
to a new set of evaluation times using constant interpolation (last value
carried forward). Ensures S(0) = 1 and monotonicity.
Usage
align_survival_curve(curve_times, curve_surv, eval_times)
Arguments
curve_times |
Numeric vector of time points from the survival curve. |
curve_surv |
Numeric vector of survival probabilities corresponding to
|
eval_times |
Numeric vector of new time points to evaluate at. |
Value
A numeric vector of survival probabilities at eval_times.
Assign Risk Groups
Description
Dichotomizes a continuous risk vector into "low" and "high" risk groups based on the median.
Usage
assign_risk_group(risk_vec)
Arguments
risk_vec |
Numeric vector of predicted risk scores. |
Value
A character vector of "low", "high", or NA.
Get Available Methods
Description
Returns a character vector of algorithm names available for classification, regression or survival tasks.
Usage
availableMethods(type = c("classification", "regression", "survival"), ...)
Arguments
type |
A character string specifying the type of task. Must be one of
|
... |
Additional arguments (currently not used). |
Details
Depending on the specified type, the function returns a different set of algorithm names:
For
"classification", it returns algorithms such as"logistic_reg","multinom_reg","decision_tree","C5_rules","rand_forest","xgboost","lightgbm","svm_linear","svm_rbf","nearest_neighbor","naive_Bayes","mlp","discrim_linear","discrim_quad", and"bag_tree".For
"regression", it returns algorithms such as"linear_reg","ridge_reg","lasso_reg","elastic_net","decision_tree","rand_forest","xgboost","lightgbm","svm_linear","svm_rbf","nearest_neighbor","mlp","pls", and"bayes_glm".For
"survival", it returns algorithms such as"rand_forest","cox_ph","penalized_cox","stratified_cox","time_varying_cox","survreg","royston_parmar","parametric_surv","piecewise_exp", and"xgboost".
Value
A character vector containing the names of the available algorithms for the specified task type.
Build Survival Matrix from survfit Object
Description
Extracts survival probabilities from a survfit object and aligns
them to a common set of evaluation times, creating a matrix.
Usage
build_survfit_matrix(fit_obj, eval_times, n_obs)
Arguments
fit_obj |
A |
eval_times |
Numeric vector of evaluation times. |
n_obs |
Expected number of observations (rows). |
Value
A matrix (rows=subjects, cols=eval_times) of survival
probabilities, or NULL on failure.
Clamp Values to [0, 1]
Description
Truncates a numeric vector so all values lie within the [0, 1] interval.
Usage
clamp01(x)
Arguments
x |
A numeric vector. |
Value
The clamped numeric vector.
Compare fastml and parsnip defaults
Description
Compares the default engine and parameter choices between fastml and parsnip for a given algorithm.
Usage
compare_defaults(algo, task, fastml_engine, fastml_params = NULL)
Arguments
algo |
Character string specifying the algorithm name. |
task |
Character string specifying the task type. |
fastml_engine |
Character string of the fastml default engine. |
fastml_params |
List of fastml default parameters. |
Value
A list with components:
- engine_differs
Logical indicating if engines differ.
- fastml_engine
The fastml default engine.
- parsnip_engine
The parsnip default engine.
- param_differences
Named list of parameters that differ.
Compute Integrated Brier Score and Curve
Description
Calculates the Brier score at specified evaluation times and the
Integrated Brier Score (IBS) up to \tau, using IPCW to handle
censoring.
Usage
compute_ibrier(
eval_times,
surv_mat,
time_vec,
status_vec,
tau,
censor_eval_fn,
normalize_by = c("non_missing", "n"),
include_zero = TRUE
)
Arguments
eval_times |
Numeric vector of evaluation time points. |
surv_mat |
Matrix of predicted survival probabilities (rows=subjects, cols=eval_times). |
time_vec |
Numeric vector of test times. |
status_vec |
Numeric vector of test statuses. |
tau |
The time horizon |
censor_eval_fn |
A function (from |
normalize_by |
Character string specifying how to normalize Brier
scores. Use |
include_zero |
Logical; if |
Value
A list with ibs (the scalar IBS value) and curve (a
numeric vector of Brier scores at eval_times).
Compute Difference in Restricted Mean Survival Time (RMST)
Description
Calculates the difference in RMST between "low" and "high" risk groups up to
a time horizon \tau. Groups are defined by median-splitting the
risk_vec.
Usage
compute_rmst_difference(
time_vec,
status_vec,
risk_vec,
tau,
surv_mat = NULL,
eval_times_full = NULL,
model_type = "other"
)
Arguments
time_vec |
Numeric vector of test times. |
status_vec |
Numeric vector of test statuses. |
risk_vec |
Numeric vector of predicted risk scores for test data. |
tau |
The time horizon |
surv_mat |
Optional. A matrix of individual survival predictions (rows=subjects, cols=times) used for model-based RMST calculation. |
eval_times_full |
Optional. A numeric vector of time points
corresponding to the columns of |
model_type |
Optional string (e.log., "rstpm2", "flexsurv") indicating if a model-based RMST calculation should be attempted. |
Value
The RMST difference (RMST_low - RMST_high), or NA_real_.
Compute Survival Matrix from survreg Model
Description
Generates a matrix of survival probabilities (rows=subjects, cols=times)
from a fitted survreg model for new data.
Usage
compute_survreg_matrix(fit_obj, new_data, eval_times)
Arguments
fit_obj |
A fitted |
new_data |
A data frame with predictor variables. |
eval_times |
Numeric vector of evaluation times. |
Value
A matrix of survival probabilities, or NULL on failure.
Compute Tau Limit (t_max)
Description
Finds the latest time point t_{max} such that at least a certain proportion
of subjects remain at risk.
Usage
compute_tau_limit(times, threshold)
Arguments
times |
Numeric vector of survival times. |
threshold |
Minimum proportion of subjects that must remain at risk. |
Value
The computed t_{max} value, or NA_real_ if no valid
times are provided.
Compute Uno's C-index (Time-Dependent AUC)
Description
Calculates Uno's C-index (a time-dependent AUC measure) for survival data, weighted by the inverse probability of censoring (IPCW).
Usage
compute_uno_c_index(
train_time,
train_status,
test_time,
test_status,
risk_vec,
tau,
censor_eval_fn
)
Arguments
train_time |
Numeric vector of training times (used for censor model). |
train_status |
Numeric vector of training statuses (used for censor model). |
test_time |
Numeric vector of test times. |
test_status |
Numeric vector of test statuses. |
risk_vec |
Numeric vector of predicted risk scores for test data. |
tau |
The time horizon |
censor_eval_fn |
A function (from |
Value
The computed Uno's C-index, or NA_real_ on failure.
Convert Various Prediction Formats to Survival Matrix
Description
Attempts to convert various survival prediction formats (e.g., list of
data frames from predict.model_fit with type "survival", matrices)
into a standardized [n_obs, n_eval_times] matrix.
Usage
convert_survival_predictions(pred_obj, eval_times, n_obs)
Arguments
pred_obj |
The prediction object. |
eval_times |
Numeric vector of evaluation times. |
n_obs |
Expected number of observations (rows). |
Value
A standardized matrix of survival probabilities, or NULL
on failure.
Generate counterfactual explanations for a fastml model
Description
Uses DALEX ceteris-paribus profiles ('predict_profile') to compute counterfactual-style what-if explanations for a given observation.
Usage
counterfactual_explain(
object,
observation,
variables = NULL,
data = c("train", "test"),
positive_class = NULL,
event_class = NULL,
label_levels = NULL,
...
)
Arguments
object |
A 'fastml' object. |
observation |
A single observation (data frame with one row) to compute counterfactuals for. |
variables |
Optional character vector of candidate variables to vary. Only numeric variables are used for counterfactual profiling. |
data |
Character string specifying which data to use for the explainer background:
|
positive_class |
Optional string used to filter lines/points in the resulting profiles for classification tasks. |
event_class |
Optional event class indicator propagated from 'fastml_prepare_explainer_inputs()' (kept for compatibility). |
label_levels |
Optional vector of label levels propagated from 'fastml_prepare_explainer_inputs()' (kept for compatibility). |
... |
Additional arguments passed to 'DALEX::predict_profile'. |
Value
A list (returned invisibly) containing the DALEX profile, filtered lines/points when 'positive_class' is supplied, and the plotted object if rendering succeeds.
Create Censoring Distribution Evaluator
Description
Creates a function to evaluate the survival function of the censoring
distribution, G(t) = P(C > t), using a Kaplan-Meier estimator.
Usage
create_censor_eval(time_vec, status_vec)
Arguments
time_vec |
Numeric vector of survival/censoring times. |
status_vec |
Numeric vector of event statuses (1=event, 0=censored). |
Value
A function that takes a numeric vector of times and returns the
estimated censoring survival probabilities G(t) at those times.
Defaults Registry for Engine and Parameter Transparency
Description
Functions to track, compare, and warn about differences between fastml defaults and parsnip defaults, providing users with full transparency and control over model configuration.
Determine rounding digits for time horizons
Description
Computes a sensible number of decimal digits to round time horizons based on the minimal positive separation between unique finite times.
Usage
determine_round_digits(times)
Arguments
times |
Numeric vector of times. |
Details
Uses the smallest strictly positive difference among sorted unique finite times,
then returns ceiling(-log10(min_diff)) truncated to [0, 6].
Value
Integer number of digits between 0 and 6.
Examples
# Not run: determine_round_digits(c(0.1, 0.12, 0.125))
NULL
Estimate Tuning Time
Description
Provides a rough estimate of tuning time based on the configuration.
Usage
estimate_tuning_time(
n_params,
n_folds = 10,
n_rows = 1000,
complexity = "balanced",
tuning_strategy = "grid",
base_fit_time = 1
)
Arguments
n_params |
Number of parameters being tuned. |
n_folds |
Number of cross-validation folds. |
n_rows |
Number of rows in training data. |
complexity |
Tuning complexity level. |
tuning_strategy |
Tuning strategy ("grid" or "bayes"). |
base_fit_time |
Estimated time for a single model fit in seconds. |
Value
A list with estimated total time and breakdown.
Compute Accumulated Local Effects (ALE) for a fastml model
Description
Uses the 'iml' package to calculate ALE for the specified feature.
Usage
explain_ale(object, feature, data = c("train", "test"), ...)
Arguments
object |
A 'fastml' object. |
feature |
Character string specifying the feature name. |
data |
Character string specifying which data to use: |
... |
Additional arguments passed to 'iml::FeatureEffect'. |
Value
An 'iml' object containing ALE results.
Examples
## Not run:
data(iris)
iris <- iris[iris$Species != "setosa", ]
iris$Species <- factor(iris$Species)
model <- fastml(data = iris, label = "Species")
explain_ale(model, feature = "Sepal.Length")
## End(Not run)
Generate DALEX explanations for a fastml model
Description
Creates a DALEX explainer and computes permutation based variable importance, partial dependence (model profiles) and Shapley values.
Usage
explain_dalex(
object,
data = c("train", "test"),
features = NULL,
grid_size = 20,
shap_sample = 5,
vi_iterations = 10,
seed = 123,
loss_function = NULL
)
Arguments
object |
A |
data |
Character string specifying which data to use for explanations:
|
features |
Character vector of feature names for partial dependence (model profiles). Default NULL. |
grid_size |
Number of grid points for partial dependence. Default 20. |
shap_sample |
Integer number of observations from the selected data source to compute SHAP values for. Default 5. |
vi_iterations |
Integer. Number of permutations for variable importance (B). Default 10. |
seed |
Integer. A value specifying the random seed. |
loss_function |
Function. The loss function for
|
Value
Invisibly returns a list with variable importance, optional model profiles and SHAP values.
Generate LIME explanations for a fastml model
Description
Creates a 'lime' explainer using processed (encoded, scaled) data and returns feature explanations for new observations. The new observation is automatically preprocessed using the same recipe to ensure alignment with the explainer background.
Usage
explain_lime(
object,
new_observation,
data = c("train", "test"),
n_features = 5,
n_labels = 1,
...
)
Arguments
object |
A 'fastml' object. |
new_observation |
A data frame containing the new observation(s) to explain. Must contain the same columns as the original training data (before preprocessing). The function will apply the stored preprocessor to transform it. |
data |
Character string specifying which data to use for the LIME explainer background:
|
n_features |
Number of features to show in the explanation. Default 5. |
n_labels |
Number of labels to explain (classification only). Default 1. |
... |
Additional arguments passed to 'lime::explain'. |
Value
An object produced by 'lime::explain'.
Examples
## Not run:
data(iris)
iris <- iris[iris$Species != "setosa", ]
iris$Species <- factor(iris$Species)
model <- fastml(data = iris, label = "Species")
explain_lime(model, new_observation = iris[1, ])
## End(Not run)
Analyze Feature Importance Stability Across Cross-Validation Folds
Description
Computes feature importance for each fold model and aggregates results to assess the stability of feature importance rankings across resamples. This helps identify features that are consistently important vs those whose importance varies across different data subsets.
Usage
explain_stability(
object,
model_name = NULL,
vi_iterations = 10,
seed = 123,
plot = TRUE,
conf_level = 0.95
)
Arguments
object |
A |
model_name |
Character string specifying which model to analyze. If NULL, uses the best model. Should match the format "algorithm (engine)", e.g., "rand_forest (ranger)". |
vi_iterations |
Integer. Number of permutations for variable importance per fold. Default is 10 for faster computation across many folds. |
seed |
Integer. Random seed for reproducibility. |
plot |
Logical. If TRUE (default), displays a stability plot showing mean importance with confidence intervals. |
conf_level |
Numeric. Confidence level for intervals. Default is 0.95. |
Details
This function requires that the fastml model was trained with
store_fold_models = TRUE, which stores the models fitted on each
cross-validation fold. Without stored fold models, only the final best
model is available, and cross-fold stability analysis is not possible.
The stability analysis computes permutation-based variable importance for each fold's model using DALEX, then aggregates across folds to show:
Mean importance and standard deviation
Confidence intervals for importance
Rank stability (how consistently features rank across folds)
Features with high mean importance but also high variance may be important for some data subsets but not others, suggesting potential instability in the model's reliance on those features.
Value
A list with class "fastml_stability" containing:
- importance_summary
Data frame with aggregated feature importance (mean, sd, se, lower/upper CI) across folds.
- fold_importance
List of per-fold variable importance results.
- rank_stability
Data frame showing how feature ranks vary across folds.
- n_folds
Number of folds analyzed.
- model_name
Name of the model analyzed.
Examples
# Train model with fold models stored
model <- fastml(
data = iris,
label = "Species",
algorithms = "rand_forest",
store_fold_models = TRUE
)
# Analyze stability
stability <- explain_stability(model)
print(stability)
Extract survreg Linear Predictor and Scale
Description
Computes the linear predictor (lp) and scale parameter(s) for new data
from a fitted survreg model.
Usage
extract_survreg_components(fit_obj, new_data)
Arguments
fit_obj |
A fitted |
new_data |
A data frame with predictor variables. |
Value
A list with elements lp (numeric vector) and scale
(numeric vector), or NULL on failure.
Explain a fastml model using various techniques
Description
Provides model explainability across several backends. With method = "dalex" it:
Creates a DALEX explainer from the trained model.
Computes permutation-based variable importance with
vi_iterationspermutations and displays the table and plot.Computes partial dependence-like model profiles when
featuresare supplied.Computes Shapley values (SHAP) for
shap_sampletraining rows, displays the SHAP table, and plots a canonical SHAP summary (beeswarm) plot colored by raw feature values and ordered by\text{mean}(\vert \text{SHAP value} \vert)per feature. For classification, separate panels per class are shown.
Usage
fastexplain(
object,
method = "dalex",
data = c("train", "test"),
features = NULL,
n_features = 5,
variables = NULL,
observation = NULL,
grid_size = 20,
shap_sample = 5,
vi_iterations = 10,
seed = 123,
loss_function = NULL,
protected = NULL,
...
)
Arguments
object |
A |
method |
Character string specifying the explanation method.
Supported values are |
data |
Character string specifying which data to use for explanations:
|
features |
Character vector of feature names for partial dependence (model profiles). Default NULL. |
n_features |
Number of features to show in the explanation (used for lime). Default 5. |
variables |
Character vector. Variable names to compute explanations for (used for counterfactuals). |
observation |
A single observation for methods that need a new data point
( |
grid_size |
Number of grid points for partial dependence. Default 20. |
shap_sample |
Integer number of observations from the selected data source to compute SHAP values for. Default 5. |
vi_iterations |
Integer. Number of permutations for variable importance (B). Default 10. |
seed |
Integer. A value specifying the random seed. |
loss_function |
Function. The loss function for
|
protected |
Character or factor vector of protected attribute(s) required for
|
... |
Additional arguments passed to the underlying helper functions
for the chosen |
Details
-
Data source selection: By default, explanations are computed on training data (
data = "train"), which reflects in-sample model behavior and may be influenced by overfitting. Setdata = "test"to compute explanations on held-out test data for a more realistic assessment of how the model uses features on unseen data. -
Method dispatch:
methodcan route to LIME, ICE, ALE, surrogate tree, interaction strengths, DALEX/modelStudio dashboards, fairness diagnostics, iBreakDown contributions, or counterfactual search. -
Variable importance controls: Use
vi_iterationsto tune permutation stability andloss_functionto override the default DALEX loss (cross-entropy for classification, RMSE for regression). -
Fairness and breakdown support: Provide
protectedformethod = "fairness"and anobservationformethod = "breakdown"ormethod = "counterfactual". Observations are aligned to the explainer data before scoring.
Value
For DALEX-based methods, prints variable importance, model profiles, and SHAP summaries. Other methods return their respective explainer objects (e.g., LIME explanations, ALE plot, surrogate tree, interaction strengths, modelStudio dashboard, fairmodels object, breakdown object, or counterfactual results), usually invisibly after plotting or printing.
Note
By default, explanations use training data. For unbiased feature importance estimates
that better reflect model generalization, use data = "test" to compute explanations
on held-out test data.
Lightweight exploratory helper
Description
'fastexplore()' is an optional, lightweight exploratory data analysis (EDA) helper. It returns summary tables and plot objects; it only writes to disk or renders a report when you explicitly request it via 'save_results' or 'render_report'.
Usage
fastexplore(
data,
label = NULL,
visualize = c("histogram", "boxplot", "barplot", "heatmap", "scatterplot"),
save_results = FALSE,
render_report = FALSE,
output_dir = NULL,
sample_size = NULL,
interactive = FALSE,
corr_threshold = 0.9,
auto_convert_numeric = TRUE,
visualize_missing = TRUE,
imputation_suggestions = FALSE,
report_duplicate_details = TRUE,
detect_near_duplicates = FALSE,
auto_convert_dates = FALSE,
feature_engineering = FALSE,
outlier_method = c("iqr", "zscore", "dbscan", "lof"),
run_distribution_checks = TRUE,
normality_tests = c("shapiro"),
pairwise_matrix = TRUE,
max_scatter_cols = 5,
grouped_plots = TRUE,
use_upset_missing = TRUE
)
Arguments
data |
A 'data.frame' to explore. |
label |
Optional column name of the target/label. If supplied and categorical, grouped plots and class balance summaries are produced. |
visualize |
Character vector indicating which plot families to build. Defaults to 'c("histogram", "boxplot", "barplot", "heatmap", "scatterplot")'. |
save_results |
Logical; if 'TRUE', plots/results are saved under 'output_dir' (defaults to the working directory). Default is 'FALSE'. |
render_report |
Logical; if 'TRUE', a short HTML report is rendered via 'rmarkdown' (if available). Default is 'FALSE'. |
output_dir |
Directory to save results/report when 'save_results' or 'render_report' is 'TRUE'. |
sample_size |
Optional integer; if supplied, visualizations are produced on a random sample of this size. |
interactive |
Logical; if 'TRUE' and 'plotly' is available, an interactive correlation heatmap is produced. Falls back to static ggplot output otherwise. |
corr_threshold |
Absolute correlation threshold for flagging high correlations. |
auto_convert_numeric |
Logical; convert factor/character columns that look numeric into numeric. |
visualize_missing |
Logical; if 'TRUE', include simple missingness visualizations. |
imputation_suggestions |
Logical; if 'TRUE', prints lightweight suggestions based on missingness patterns. |
report_duplicate_details |
Logical; if 'TRUE', returns a small sample of duplicated rows when present. |
detect_near_duplicates |
Placeholder for future fuzzy duplicate checks. |
auto_convert_dates |
Logical; convert YYYY-MM-DD strings to 'Date'. |
feature_engineering |
Logical; if 'TRUE', derive day/month/year from date columns to aid inspection of temporal structure. |
outlier_method |
One of '"iqr"', '"zscore"', '"dbscan"', '"lof"'. |
run_distribution_checks |
Logical; if 'TRUE', run normality tests on numeric columns. |
normality_tests |
Character vector of normality tests to run; currently supports '"shapiro"' and '"ks"'. |
pairwise_matrix |
Logical; if 'TRUE' and 'GGally' is available, returns a ggpairs scatterplot matrix for a subset of numeric columns. |
max_scatter_cols |
Maximum number of numeric columns to include in the pairwise matrix. |
grouped_plots |
Logical; if 'TRUE' and 'label' is a factor, group histograms/boxplots/density plots by label. |
use_upset_missing |
Logical; retained for compatibility. When 'TRUE' and 'UpSetR' is installed, an UpSet plot of missingness is returned; otherwise a simpler missingness heatmap is used. |
Details
This helper is intentionally decoupled from the core modeling workflow. Most of its heavy dependencies are treated as optional and loaded via 'requireNamespace()' when requested features are used.
Value
A list of summaries (tables/tibbles) and plot objects (ggplot/plotly), plus any saved file paths when 'save_results'/'render_report' are enabled.
Fast Machine Learning Function
Description
Trains and evaluates multiple classification or regression models automatically detecting the task based on the target variable type.
Usage
fastml(
data = NULL,
train_data = NULL,
test_data = NULL,
label,
algorithms = "all",
task = "auto",
test_size = 0.2,
resampling_method = if (identical(task, "survival")) "none" else "cv",
folds = ifelse(grepl("cv", resampling_method), 10, 25),
repeats = NULL,
group_cols = NULL,
block_col = NULL,
block_size = NULL,
initial_window = NULL,
assess_window = NULL,
skip = 0,
outer_folds = NULL,
event_class = "first",
exclude = NULL,
recipe = NULL,
tune_params = NULL,
engine_params = list(),
metric = NULL,
class_threshold = "auto",
algorithm_engines = NULL,
use_parsnip_defaults = FALSE,
warn_engine_defaults = TRUE,
n_cores = 1,
stratify = TRUE,
impute_method = "error",
encode_categoricals = TRUE,
scaling_methods = c("center", "scale"),
balance_method = "none",
resamples = NULL,
summaryFunction = NULL,
use_default_tuning = FALSE,
tuning_strategy = "grid",
tuning_iterations = 10,
tuning_complexity = "balanced",
grid_levels = NULL,
early_stopping = FALSE,
adaptive = FALSE,
learning_curve = FALSE,
seed = 123,
verbose = FALSE,
eval_times = NULL,
survival_metric_convention = "fastml",
bootstrap_ci = TRUE,
bootstrap_samples = 500,
bootstrap_seed = NULL,
at_risk_threshold = 0.1,
audit_mode = FALSE,
multiclass_auc = "macro",
store_fold_models = FALSE
)
Arguments
data |
A data frame containing the complete dataset. If both 'train_data' and 'test_data' are 'NULL', 'fastml()' will split this into training and testing sets according to 'test_size' and 'stratify'. When 'group_cols' is supplied, the holdout keeps groups intact; when 'block_col' is supplied, the holdout uses the last rows in time order. Defaults to 'NULL'. |
train_data |
A data frame pre-split for model training. If provided, 'test_data' must also be supplied, and no internal splitting will occur. Defaults to 'NULL'. |
test_data |
A data frame pre-split for model evaluation. If provided, 'train_data' must also be supplied, and no internal splitting will occur. Defaults to 'NULL'. |
label |
A string specifying the name of the target variable. For survival analysis, supply a character vector with the names of the time and status columns. |
algorithms |
A vector of algorithm names to use. Default is |
task |
Character string specifying model type selection. Use "auto" to let the function detect whether the target is for classification, regression, or survival based on the data. Survival is detected when 'label' is a character vector of length 2 that matches time and status columns in the data. You may also explicitly set to "classification", "regression", or "survival". |
test_size |
A numeric value between 0 and 1 indicating the proportion of the data to use for testing. For grouped holdout, this is applied to groups; for time-ordered holdout, it selects the final proportion of rows. Default is |
resampling_method |
A string specifying the resampling method for model evaluation. Default is |
folds |
An integer specifying the number of folds for cross-validation. Default is |
repeats |
Number of times to repeat cross-validation (only applicable for methods like "repeatedcv"). |
group_cols |
Character vector naming one or more grouping columns used when
|
block_col |
Single column name that defines the ordering variable for
|
block_size |
Positive integer specifying the block size for |
initial_window |
Positive integer giving the number of observations in the initial training
window for |
assess_window |
Positive integer giving the number of observations in each assessment window for
|
skip |
Non-negative integer specifying how many potential rolling windows to skip between
successive resamples when |
outer_folds |
Positive integer giving the number of outer folds to use when
|
event_class |
A single string. Either "first" or "second" to specify which
level of the binary outcome factor to treat as the positive class (the "event").
For binary classification, "first" treats the first factor level as the positive
class, "second" treats the second level as positive. Use
|
exclude |
A character vector specifying the names of the columns to be excluded from the training process. |
recipe |
A user-defined |
tune_params |
A named list of tuning ranges for each algorithm and engine
pair. Example: |
engine_params |
A named list of engine-level arguments to pass directly
to the underlying model fitting functions. Use this for fixed settings that
should apply whenever an engine is fitted (for example,
|
metric |
The performance metric to optimize during training. For
classification, options include |
class_threshold |
For binary classification, controls how class probabilities
are converted into hard class predictions during holdout evaluation. Numeric
values in (0, 1) set a fixed threshold. The default |
algorithm_engines |
A named list specifying the engine to use for each algorithm. |
use_parsnip_defaults |
Logical. If |
warn_engine_defaults |
Logical. If |
n_cores |
An integer specifying the number of CPU cores to use for parallel processing. Default is |
stratify |
Logical indicating whether to use stratified sampling when splitting the data. Only applied to random holdout splitting. Default is |
impute_method |
Method for handling missing values. Options include:
All imputation occurs inside the recipe so the same trained preprocessing
can be applied at prediction time. Default is |
encode_categoricals |
Logical indicating whether to encode categorical variables. Default is |
scaling_methods |
Vector of scaling methods to apply. Default is |
balance_method |
Method to handle class imbalance. One of |
resamples |
Optional rsample object providing custom resampling splits.
If supplied, |
summaryFunction |
A custom summary function for model evaluation. Default is |
use_default_tuning |
Logical. Tuning only runs when resamples are supplied and
|
tuning_strategy |
A string specifying the tuning strategy. Must be one of
|
tuning_iterations |
Number of iterations for Bayesian tuning. Ignored when
|
tuning_complexity |
Character string specifying a tuning complexity preset that controls grid density and parameter range width. One of:
See |
grid_levels |
Integer specifying the number of levels per parameter for grid search. Higher values create denser grids but increase computation time exponentially (grid size = levels^n_params). Typical values:
If |
early_stopping |
Logical indicating whether to use early stopping in Bayesian tuning methods (if supported). Default is |
adaptive |
Logical indicating whether to use adaptive/racing methods for tuning. Default is |
learning_curve |
Logical. If TRUE, generate learning curves (performance vs. training size). |
seed |
An integer value specifying the random seed for reproducibility. fastml also configures parallel backends for deterministic RNG streams when possible; some external engines (e.g., h2o, spark, keras) may still be nondeterministic and will emit a warning. |
verbose |
Logical; if TRUE, prints progress messages during the training and evaluation process. |
eval_times |
Optional numeric vector of evaluation horizons for survival
models. When |
survival_metric_convention |
Character string specifying which survival metric conventions to follow. ‘"fastml"' (default) uses fastml’s internal defaults for evaluation horizons and t_max. '"tidymodels"' uses 'eval_times' as the explicit evaluation grid and applies yardstick-style Brier/IBS normalization; when 'eval_times' is 'NULL', time-dependent Brier metrics are omitted. |
bootstrap_ci |
Logical indicating whether bootstrap confidence intervals should be computed for performance metrics. Applies to all task types. |
bootstrap_samples |
Integer giving the number of bootstrap resamples to
use when |
bootstrap_seed |
Optional seed passed to the bootstrap procedure used to estimate confidence intervals. When omitted, defaults to 'seed' for reproducible intervals; set to 'NULL' to allow random bootstrap draws. |
at_risk_threshold |
Numeric value between 0 and 1 used for survival
metrics to determine the last follow-up time ( |
audit_mode |
Logical; if |
multiclass_auc |
For multiclass ROC AUC, the averaging method to use: '"macro"' (default, tidymodels) or '"macro_weighted"'. Macro weights each class equally, while macro_weighted weights by class prevalence and can change model rankings on imbalanced data. |
store_fold_models |
Logical. If |
Details
Fast Machine Learning Function
Trains and evaluates multiple classification or regression models. The function automatically detects the task based on the target variable type and can perform advanced hyperparameter tuning using various tuning strategies.
Model selection is based exclusively on resampling metrics (cross-validation
or nested CV). The holdout split is reserved for final performance
estimation and is never used to choose the best model, mirroring
tidymodels::last_fit() semantics.
For multiclass ROC AUC, fastml defaults to macro averaging (tidymodels). Macro treats each class equally, while macro_weighted weights by class prevalence and can change model rankings on imbalanced data. Keep the same setting when comparing runs.
## Tuning: Speed vs Robustness Trade-offs
Hyperparameter tuning involves a fundamental trade-off between computational
cost and the likelihood of finding optimal hyperparameters. fastml provides
presets via tuning_complexity to make this trade-off explicit:
| Level | Grid Size* | Time | Quality | Use Case |
| quick | ~32 | ~1x | Low | Prototyping, debugging |
| balanced | ~243 | ~10x | Medium | Most production use |
| thorough | ~3,125 | ~100x | High | Final models, papers |
| exhaustive | ~16,807 | ~1000x | Very High | Research, competitions |
*Grid size shown for 5 tunable parameters (levels^5)
**Recommendations:**
Start with
tuning_complexity = "quick"during developmentUse
"balanced"(default) for most production pipelinesSwitch to
"thorough"for final model selectionConsider
tuning_strategy = "bayes"instead of exhaustive grid searchEnable
adaptive = TRUEfor early stopping of poor configurations
Use print_tuning_presets to see all presets and
estimate_tuning_time to estimate runtime before starting.
Value
An object of class fastml containing the best model, performance metrics, and other information.
Factor Level Warning
For binary classification, the interpretation of metrics like sensitivity, specificity,
and ROC AUC depends on which factor level is treated as the "positive" class (the event
of interest). The event_class parameter controls this:
-
"first"(default): The first factor level is treated as positive -
"second": The second factor level is treated as positive
Important: Recipe preprocessing steps like step_other() or
step_unknown() can modify factor levels, potentially changing which level
is "first" or "second". Always verify factor levels after preprocessing.
To ensure consistent behavior, explicitly set factor levels before calling fastml:
# Ensure "positive" is the second level (event_class = "second")
data$outcome <- factor(data$outcome, levels = c("negative", "positive"))
# Or ensure "positive" is the first level (event_class = "first")
data$outcome <- factor(data$outcome, levels = c("positive", "negative"))
Examples
# Example 1: Using the iris dataset for binary classification (excluding 'setosa')
data(iris)
iris <- iris[iris$Species != "setosa", ] # Binary classification
iris$Species <- factor(iris$Species)
# Define a custom tuning grid for the ranger engine
tune <- list(
rand_forest = list(
ranger = list(mtry = c(1, 3))
)
)
# Train models with custom tuning
model <- fastml(
data = iris,
label = "Species",
algorithms = "rand_forest",
tune_params = tune,
use_default_tuning = TRUE
)
# View model summary
summary(model)
Evaluate Models Function
Description
Evaluates the trained models on the test data and computes performance metrics.
Usage
fastml_compute_holdout_results(
models,
train_data,
test_data,
label,
start_col = NULL,
time_col = NULL,
status_col = NULL,
task,
metric = NULL,
event_class,
class_threshold = "auto",
eval_times = NULL,
bootstrap_ci = TRUE,
bootstrap_samples = 500,
bootstrap_seed = 1234,
at_risk_threshold = 0.1,
survival_metric_convention = "fastml",
precomputed_predictions = NULL,
summaryFunction = NULL,
multiclass_auc = "macro"
)
Arguments
models |
A list of trained model objects. |
train_data |
Preprocessed training data frame. |
test_data |
Preprocessed test data frame. |
label |
Name of the target variable. For survival analysis this should be a character vector of length two giving the names of the time and status columns. |
start_col |
Optional string. The name of the column specifying the
start time in counting process (e.g., '(start, stop, event)') survival
data. Only used when |
time_col |
String. The name of the column specifying the event or
censoring time (the "stop" time in counting process data). Only used
when |
status_col |
String. The name of the column specifying the event
status (e.g., 0 for censored, 1 for event). Only used when
|
task |
Type of task: "classification", "regression", or "survival". |
metric |
The performance metric to optimize (e.g., "accuracy", "rmse"). |
event_class |
A single string. Either "first" or "second" to specify which level of truth to consider as the "event". |
class_threshold |
For binary classification, controls how class probabilities are converted into hard class predictions. Numeric values in (0, 1) set a fixed threshold. The default '"auto"' tunes a threshold on the training data to maximize F1; use ‘"model"' to keep the model’s default threshold. |
eval_times |
Optional numeric vector of evaluation horizons for survival
metrics. Passed through to |
bootstrap_ci |
Logical indicating whether bootstrap confidence intervals should be computed for the evaluation metrics. |
bootstrap_samples |
Number of bootstrap resamples used when
|
bootstrap_seed |
Optional integer seed for the bootstrap procedure used in metric estimation. |
at_risk_threshold |
Minimum proportion of subjects that must remain at
risk to define |
survival_metric_convention |
Character string specifying which survival metric conventions to follow. ‘"fastml"' (default) uses fastml’s internal defaults for evaluation horizons and t_max. '"tidymodels"' uses 'eval_times' as the explicit evaluation grid and applies yardstick-style Brier/IBS normalization; when 'eval_times' is 'NULL', time-dependent Brier metrics are omitted. |
precomputed_predictions |
Optional data frame or nested list of previously generated predictions (per algorithm/engine) to reuse instead of recomputing. This is mainly used when combining results across engines. |
summaryFunction |
Optional custom classification metric function passed
through to |
multiclass_auc |
For multiclass ROC AUC, the averaging method to use: '"macro"' (default, tidymodels) or '"macro_weighted"'. Macro weights each class equally, while macro_weighted weights by class prevalence and can change model rankings on imbalanced data. |
Value
A list with two elements:
- performance
A named list of performance metric tibbles for each model.
- predictions
A named list of data frames with columns including truth, predictions, and probabilities per model.
Guarded Resampling Utilities
Description
Internal helpers that enforce the Guarded Resampling Principle by fitting preprocessing pipelines independently within each resampling split. These functions are not exported.
Usage
fastml_guard_validate_indices(indices, label)
Arguments
indices |
Numeric vector of row indices for a resample split. |
label |
Character string used to identify the index source in errors. |
Internal helpers for survival-specific preprocessing
Description
These utilities standardize survival status indicators so that downstream metrics always receive the conventional coding (0 = censored, 1 = event). The functions are intentionally unexported and are used across multiple internal modules. Normalize survival status coding to 0/1 representation
Usage
fastml_normalize_survival_status(status_vec, reference_length = NULL)
Arguments
status_vec |
A vector containing survival status information. May be numeric, logical, factor, or character. |
reference_length |
Optional integer specifying the desired length of the returned vector. When 'status_vec' is 'NULL', this value controls the length of the output (defaulting to 0 when not supplied). |
Details
This helper attempts to coerce a status vector into a numeric format where 0 represents censoring and 1 represents the event indicator. It accepts a variety of common encodings such as 1/2, logical values, factors, or character labels. When the supplied values deviate from the canonical coding, the function records that a recode was performed so callers can communicate this to the user (once).
Value
A list with two elements: 'status', the recoded numeric vector, and 'recoded', a logical flag indicating whether a non-standard encoding was detected.
Internal helper to prepare explainer inputs from a fastml object
Description
Internal helper to prepare explainer inputs from a fastml object
Usage
fastml_prepare_explainer_inputs(object, data = c("train", "test"))
Arguments
object |
A fastml object. |
data |
Character string specifying which data to use: "train" (default) or "test". |
Flatten and Rename Models
Description
Flatten and Rename Models
Usage
flatten_and_rename_models(models)
Arguments
models |
A named list of model objects, optionally nested by engine. |
Format Default Override Warning Message
Description
Creates a human-readable warning message about default overrides.
Usage
format_default_override_warning(algo, comparison, show_params = TRUE)
Arguments
algo |
Algorithm name. |
comparison |
Result from compare_defaults(). |
show_params |
Logical; whether to include parameter differences. |
Value
Character string with the warning message.
Get Best Model Indices by Metric and Group
Description
Get Best Model Indices by Metric and Group
Usage
get_best_model_idx(df, metric, group_cols = c("Model", "Engine"))
Arguments
df |
Data frame containing model performance metrics. |
metric |
Character string naming the column that holds the metric values. |
group_cols |
Character vector of columns used to group models. |
Get Best Model Names
Description
Extracts and returns the best engine names from a named list of model workflows.
Usage
get_best_model_names(models)
Arguments
models |
A named list where each element corresponds to an algorithm and contains a list of model workflows.
Each workflow should be compatible with |
Details
For each algorithm, the function extracts the engine names from the model workflows using tune::extract_fit_parsnip.
It then chooses "randomForest" if it is available; otherwise, it selects the first non-NA engine.
If no engine names can be extracted for an algorithm, NA_character_ is returned.
Value
A named character vector. The names of the vector correspond to the algorithm names, and the values represent the chosen best engine name for that algorithm.
Get Best Workflows
Description
Extracts the best workflows from a nested list of model workflows based on the provided best model names.
Usage
get_best_workflows(models, best_model_name)
Arguments
models |
A nested list of model workflows. |
best_model_name |
A named character vector of chosen best engines. |
Get All Default Differences Summary
Description
Returns a summary of all differences between fastml and parsnip defaults for the specified algorithms.
Usage
get_default_differences(algorithms, task = "classification")
Arguments
algorithms |
Character vector of algorithm names. |
task |
Task type ("classification", "regression", or "survival"). |
Value
A data frame summarizing the differences.
Get Default Engine
Description
Returns the default engine corresponding to the specified algorithm.
Usage
get_default_engine(algo, task = NULL)
Arguments
algo |
A character string specifying the name of the algorithm. The value should match one of the supported algorithm names. |
task |
Optional task type (e.g., |
Details
The function uses a switch statement to select the default engine based on the given algorithm. For survival random forests, the function defaults to "aorsf". If the provided algorithm does not have a defined default engine, the function terminates with an error.
Value
A character string containing the default engine name associated with the provided algorithm.
Get Default Parameters for an Algorithm
Description
Returns a list of default tuning parameters for the specified algorithm based on the task type, number of predictors, and engine.
Usage
get_default_params(algo, task, num_predictors = NULL, engine = NULL)
Arguments
algo |
A character string specifying the algorithm name. Supported values include:
|
task |
A character string specifying the task type, typically |
num_predictors |
An optional numeric value indicating the number of predictors. This value is used to compute default values for parameters such as |
engine |
An optional character string specifying the engine to use. If not provided, a default engine is chosen where applicable. |
Details
The function employs a switch statement to select and return a list of default parameters tailored for the given algorithm, task, and engine. The defaults vary by algorithm and, in some cases, by engine. For example:
For
"rand_forest", ifengineis not provided, it defaults to"ranger". The parameters such asmtry,trees, andmin_nare computed based on the task and the number of predictors.For
"C5_rules", the defaults includetrees,min_n, andsample_size.For
"xgboost"and"lightgbm", default values are provided for parameters like tree depth, learning rate, and sample size.For
"logistic_reg"and"multinom_reg", the function returns defaults for regularization parameters (penaltyandmixture) that vary with the specified engine.For
"decision_tree", the parameters (such astree_depth,min_n, andcost_complexity) are set based on the engine (e.g.,"rpart","C5.0","partykit","spark").Other algorithms, including
"svm_linear","svm_rbf","nearest_neighbor","naive_Bayes","mlp","deep_learning","elastic_net","bayes_glm","pls","linear_reg","ridge_reg", and"lasso_reg", have their respective default parameter lists.
Value
A list of default parameter settings for the specified algorithm. If the algorithm is not recognized, the function returns NULL.
Get Default Parameters with Transparency Warnings
Description
Wrapper around get_default_params that optionally warns when fastml's
default parameters differ from parsnip/engine defaults.
Usage
get_default_params_with_warnings(
algo,
task,
num_predictors = NULL,
engine = NULL,
warn_param_defaults = TRUE,
verbose = FALSE
)
Arguments
algo |
Algorithm name. |
task |
Task type. |
num_predictors |
Number of predictors (optional). |
engine |
Engine name (optional). |
warn_param_defaults |
Logical; if TRUE, warn about parameter differences. |
verbose |
Logical; if TRUE, print parameter selections. |
Value
A list of default parameters.
Get Default Tuning Parameters
Description
Returns a list of default tuning parameter ranges for a specified algorithm based on the provided training data, outcome label, and engine.
Usage
get_default_tune_params(algo, train_data, label, engine)
Arguments
algo |
A character string specifying the algorithm name. Supported values include: |
train_data |
A data frame containing the training data. |
label |
A character string specifying the name of the outcome variable in |
engine |
A character string specifying the engine to be used for the algorithm. Different engines may have different tuning parameter ranges. |
Details
The function first determines the number of predictors by removing the outcome variable (specified by label) from train_data. It then uses a switch statement to select a list of default tuning parameter ranges tailored for the specified algorithm and engine. The tuning ranges have been adjusted for efficiency and may include parameters such as mtry, trees, min_n, and others depending on the algorithm.
Value
A list of tuning parameter ranges for the specified algorithm. If no tuning parameters are defined for the given algorithm, the function returns NULL.
Get Engine Names from Model Workflows
Description
Extracts and returns a list of unique engine names from a list of model workflows.
Usage
get_engine_names(models)
Arguments
models |
A list where each element is a list of model workflows. Each workflow is expected to contain a fitted model that can be processed with |
Details
The function applies tune::extract_fit_parsnip to each model workflow to extract the fitted model object. It then retrieves the engine name from the model specification (spec$engine). If the extraction fails, NA_character_ is returned for that workflow. Finally, the function removes any duplicate engine names using unique.
Value
A list of character vectors. Each vector contains the unique engine names extracted from the corresponding element of models.
Expanded Default Tuning Parameters
Description
Returns expanded tuning parameter ranges that provide better coverage
than the minimal defaults. These are used when tuning_complexity
is set to "thorough" or "exhaustive".
Usage
get_expanded_tune_params(algo, train_data, label, engine)
Arguments
algo |
Algorithm name. |
train_data |
Training data frame. |
label |
Outcome variable name. |
engine |
Engine name. |
Value
A list of expanded tuning parameter ranges.
Get Model Engine Names
Description
Extracts and returns a named vector mapping algorithm names to engine names from a nested list of model workflows.
Usage
get_model_engine_names(models)
Arguments
models |
A nested list of model workflows. Each inner list should contain model objects from which a fitted model can be extracted using |
Details
The function iterates over a nested list of model workflows and, for each workflow, attempts to extract the fitted model object using tune::extract_fit_parsnip. If successful, it retrieves the algorithm name from the first element of the class attribute of the model specification and the engine name from the specification. The results are combined into a named vector.
Value
A named character vector where the names correspond to algorithm names (e.g., "rand_forest", "logistic_reg") and the values correspond to the associated engine names (e.g., "ranger", "glm").
Get Parsnip Default Engine for an Algorithm
Description
Returns the default engine that parsnip would use for a given algorithm, allowing comparison with fastml's optimized defaults.
Usage
get_parsnip_default_engine(algo, task = NULL)
Arguments
algo |
Character string specifying the algorithm name. |
task |
Character string specifying the task type ("classification", "regression", or "survival"). |
Details
This function documents parsnip's default engine choices as of tidymodels 1.x. These may change with future parsnip versions.
Value
Character string of the parsnip default engine, or NULL if unknown.
Get Parsnip Default Parameters for an Algorithm
Description
Returns the default parameter values that parsnip would use for a given algorithm and engine combination.
Usage
get_parsnip_default_params(algo, engine = NULL, task = NULL)
Arguments
algo |
Character string specifying the algorithm name. |
engine |
Character string specifying the engine. |
task |
Character string specifying the task type. |
Details
These defaults are based on parsnip's internal defaults and the underlying engine defaults. They may differ from fastml's optimized defaults which are tuned for better out-of-box performance.
Value
A named list of default parameter values, or NULL if unknown.
Extract Time and Status from Survival Matrix
Description
Helper function to extract "time" and "status" columns from a matrix
(like one returned by survival::Surv()), falling back to defaults.
Usage
get_surv_info(surv_matrix_vals, default_time, default_status)
Arguments
surv_matrix_vals |
A matrix, typically from |
default_time |
Default time vector if not found. |
default_status |
Default status vector if not found. |
Value
A list with elements time and status.
Tuning Complexity Presets
Description
Returns the configuration for a given tuning complexity level, including grid levels, parameter ranges, and expected computational characteristics.
Usage
get_tuning_complexity(
complexity = c("balanced", "quick", "thorough", "exhaustive")
)
Arguments
complexity |
Character string specifying the tuning complexity level. One of:
|
Details
## Speed-Robustness Trade-offs
Hyperparameter tuning involves a fundamental trade-off between computational cost and the likelihood of finding optimal hyperparameters:
| Level | Grid Size | Time | Robustness | Use Case |
| quick | 4-27 | ~1x | Low | Prototyping, debugging |
| balanced | 27-256 | ~10x | Medium | Most production use |
| thorough | 256-3125 | ~100x | High | Final models, papers |
| exhaustive | 1000-10000+ | ~1000x | Very High | Research, competitions |
### Recommendations:
1. **Start with "quick"** during development to iterate fast 2. **Use "balanced"** for most production pipelines 3. **Switch to "thorough"** for final model selection 4. **Consider Bayesian tuning** ('tuning_strategy = "bayes"') for high-dimensional parameter spaces instead of exhaustive grid search 5. **Use adaptive/racing** ('adaptive = TRUE') to early-stop poor configurations
### Computational Scaling:
Grid search scales as O(L^P * F * N) where: - L = number of levels per parameter - P = number of parameters being tuned - F = number of cross-validation folds - N = dataset size
For a model with 5 tunable parameters and 10-fold CV: - quick (L=2): 2^5 * 10 = 320 model fits - balanced (L=3): 3^5 * 10 = 2,430 model fits - thorough (L=5): 5^5 * 10 = 31,250 model fits
Value
A list with components:
- grid_levels
Integer number of levels per parameter for grid search.
- bayes_iterations
Integer number of iterations for Bayesian tuning.
- description
Human-readable description of the complexity level.
- speed_estimate
Relative speed estimate (1 = baseline).
- robustness_estimate
Relative robustness estimate (1-5 scale).
Examples
# Get configuration for balanced tuning
config <- get_tuning_complexity("balanced")
print(config$grid_levels) # 3
# See all available presets
print_tuning_presets()
Get Tuning Parameters for Complexity Level
Description
Returns algorithm-specific tuning parameter ranges adjusted for the specified complexity level.
Usage
get_tuning_params_for_complexity(
algo,
train_data,
label,
engine,
complexity = "balanced"
)
Arguments
algo |
Character string specifying the algorithm name. |
train_data |
Data frame containing the training data. |
label |
Character string specifying the outcome variable name. |
engine |
Character string specifying the engine. |
complexity |
Character string specifying tuning complexity level. |
Details
Parameter ranges are scaled based on the complexity level:
- quick: Narrower ranges (70
- balanced: Standard ranges (100
- thorough: Wider ranges (130
- exhaustive: Very wide ranges (150
Value
A list of tuning parameter ranges.
Compute feature interaction strengths for a fastml model
Description
Uses the 'iml' package to quantify the strength of feature interactions.
Usage
interaction_strength(object, data = c("train", "test"), ...)
Arguments
object |
A 'fastml' object. |
data |
Character string specifying which data to use: |
... |
Additional arguments passed to 'iml::Interaction'. |
Value
An 'iml::Interaction' object.
Examples
## Not run:
data(iris)
iris <- iris[iris$Species != "setosa", ]
iris$Species <- factor(iris$Species)
model <- fastml(data = iris, label = "Species")
interaction_strength(model)
## End(Not run)
Load Model Function
Description
Loads a trained model object from a file.
Usage
load_model(filepath)
Arguments
filepath |
A string specifying the file path to load the model from. |
Value
An object of class fastml.
Map Brier Curve Values to Specific Horizons
Description
Extracts Brier score values from a pre-computed curve at specific time horizons by finding the closest matching evaluation time.
Usage
map_brier_values(curve, eval_times, horizons)
Arguments
curve |
Numeric vector of Brier scores from |
eval_times |
Numeric vector of times corresponding to |
horizons |
Numeric vector of target time horizons to extract. |
Value
A numeric vector of Brier scores corresponding to horizons.
Plot Methods for fastml Objects
Description
plot.fastml produces visual diagnostics for a trained fastml object.
Usage
## S3 method for class 'fastml'
plot(
x,
algorithm = "best",
type = c("all", "bar", "roc", "calibration", "residual", "learning_curve"),
...
)
Arguments
x |
A |
algorithm |
Character vector specifying which algorithm(s) to include when
generating certain plots (e.g., ROC curves). Defaults to |
type |
Character vector indicating which plot(s) to produce. Options are:
|
... |
Additional arguments (currently unused). |
Details
When type = "all", plot.fastml will produce a bar plot of metrics,
ROC curves (classification), calibration plot, and residual diagnostics (regression).
If you specify a subset of types, only those will be drawn.
Examples
## Create a binary classification dataset from iris
data(iris)
iris <- iris[iris$Species != "setosa",]
iris$Species <- factor(iris$Species)
## Fit fastml model on binary classification task
model <- fastml(data = iris, label = "Species", algorithms = c("rand_forest", "svm_rbf"))
## 1. Plot all available diagnostics
plot(model, type = "all")
## 2. Bar plot of performance metrics
plot(model, type = "bar")
## 3. ROC curves (only for classification models)
plot(model, type = "roc")
## 4. Calibration plot (requires 'probably' package)
plot(model, type = "calibration")
## 5. ROC curves for specific algorithm(s) only
plot(model, type = "roc", algorithm = "rand_forest")
## 6. Residual diagnostics (only available for regression tasks)
model <- fastml(data = mtcars, label = "mpg", algorithms = c("linear_reg", "xgboost"))
plot(model, type = "residual")
Plot method for fastml_stability objects
Description
Plot method for fastml_stability objects
Usage
## S3 method for class 'fastml_stability'
plot(x, top_n = 15, ...)
Arguments
x |
A fastml_stability object from explain_stability(). |
top_n |
Integer. Number of top features to display. Default is 15. |
... |
Additional arguments (ignored). |
Value
A ggplot object.
Plot ICE curves for a fastml model
Description
Generates Individual Conditional Expectation (ICE) plots for selected features using the 'pdp' package (ggplot2 engine), and returns both the underlying data and the plot object.
Usage
plot_ice(object, features, data = c("train", "test"), target_class = NULL, ...)
Arguments
object |
A 'fastml' object. |
features |
Character vector of feature names to plot. |
data |
Character string specifying which data to use: |
target_class |
For classification, which class probability to plot. If NULL (default), uses the positive class determined by the model settings. For multiclass problems, this shows the probability of the specified class vs all others. |
... |
Additional arguments passed to 'pdp::partial'. |
Value
A list with two elements: 'data' (the ICE data frame) and 'plot' (the ggplot object).
Examples
## Not run:
data(iris)
iris <- iris[iris$Species != "setosa", ]
iris$Species <- factor(iris$Species)
model <- fastml(data = iris, label = "Species")
plot_ice(model, features = "Sepal.Length")
## End(Not run)
Predict method for fastml objects
Description
Generates predictions from a trained 'fastml' object on new data. Supports both single-model and multi-model workflows, and handles classification and regression tasks with optional post-processing and verbosity.
Usage
## S3 method for class 'fastml'
predict(
object,
newdata,
type = "auto",
model_name = NULL,
verbose = FALSE,
postprocess_fn = NULL,
eval_time = NULL,
...
)
Arguments
object |
A fitted 'fastml' object created by the 'fastml()' function. |
newdata |
A data frame or tibble containing new predictor data for which to generate predictions. |
type |
Type of prediction to return. One of '"auto"' (default), '"class"', '"prob"', '"numeric"', '"survival"', or '"risk"'. - '"auto"': chooses '"class"' for classification, '"numeric"' for regression, and '"survival"' for survival. - '"prob"': returns class probabilities (only for classification). - '"class"': returns predicted class labels. - '"numeric"': returns predicted numeric values (for regression). - '"survival"': returns survival probabilities at the supplied 'eval_time' horizons (for survival tasks). - '"risk"': returns risk scores on the linear predictor scale (for survival tasks). |
model_name |
(Optional) Name of a specific model to use when 'object$best_model' contains multiple models. |
verbose |
Logical; if 'TRUE', prints progress messages showing which models are used during prediction. |
postprocess_fn |
(Optional) A function to apply to the final predictions (e.g., inverse transforms, thresholding). |
eval_time |
Optional numeric vector of time points (on the original time scale) at which to return survival probabilities when 'type = "survival"'. Required for survival tasks when requesting survival curves. |
... |
Additional arguments (currently unused). |
Value
A vector of predictions, or a named list of predictions (if multiple models are used). If 'postprocess_fn' is supplied, its output will be returned instead.
Internal predict_model method for parsnip fits
Description
Shim for parsnip model objects so that lime's predict_model generic ignores unused arguments passed via '...'.
Usage
predict_model.model_fit(x, newdata, type, ...)
Predict Risk Scores from a Survival Model
Description
Provides a consistent interface for computing linear predictors (risk scores) across various survival modeling engines, including native fastml models (e.g., Cox proportional hazards, XGBoost Cox) and parsnip/workflow objects.
Usage
predict_risk(fit, newdata, ...)
## S3 method for class 'fastml_native_survival'
predict_risk(fit, newdata, ...)
## S3 method for class 'workflow'
predict_risk(fit, newdata, ...)
## Default S3 method:
predict_risk(fit, newdata, ...)
Arguments
fit |
A fitted survival model object. |
newdata |
A data frame containing predictor variables for which to compute risk scores. |
... |
Additional arguments passed to specific methods. |
Value
A numeric vector of risk scores, where higher values indicate greater predicted risk.
Predict survival probabilities from a survival model
Description
Predict survival probabilities from a survival model
Usage
predict_survival(fit, newdata, times, ...)
## S3 method for class 'fastml_native_survival'
predict_survival(fit, newdata, times, ...)
## S3 method for class 'workflow'
predict_survival(fit, newdata, times, ...)
## Default S3 method:
predict_survival(fit, newdata, times, ...)
Arguments
fit |
A fitted survival model. |
newdata |
A data frame of predictors for which to compute survival curves. |
times |
Numeric vector of evaluation times. |
... |
Additional arguments passed to methods. |
Value
A numeric matrix with one row per observation and one column per time.
Print method for fastml_stability objects
Description
Print method for fastml_stability objects
Usage
## S3 method for class 'fastml_stability'
print(x, top_n = 10, ...)
Arguments
x |
A fastml_stability object from explain_stability(). |
top_n |
Integer. Number of top features to display. Default is 10. |
... |
Additional arguments (ignored). |
Value
The input object invisibly.
Print Default Differences Table
Description
Prints a formatted table showing differences between fastml and parsnip defaults for the specified task type.
Usage
print_default_differences(task = "classification", algorithms = NULL)
Arguments
task |
Task type ("classification", "regression", or "survival"). |
algorithms |
Optional character vector of algorithms to check. If NULL, checks all available algorithms for the task. |
Value
Invisibly returns the differences data frame.
Print Tuning Presets Summary
Description
Prints a formatted summary of all available tuning complexity presets with their characteristics and recommended use cases.
Usage
print_tuning_presets()
Value
Invisibly returns a data frame with preset information.
Process and Evaluate a Model Workflow
Description
This function processes a fitted model or a tuning result, finalizes the model if tuning was used, makes predictions on the test set, and computes performance metrics depending on the task type (classification or regression). It supports binary and multiclass classification, and handles probabilistic outputs when supported by the modeling engine.
Usage
process_model(
model_obj,
model_id,
task,
test_data,
label,
event_class,
class_threshold = "auto",
start_col = NULL,
time_col = NULL,
status_col = NULL,
engine,
train_data,
metric,
eval_times_user = NULL,
bootstrap_ci = TRUE,
bootstrap_samples = 500,
bootstrap_seed = 1234,
at_risk_threshold = 0.1,
survival_metric_convention = "fastml",
metrics = NULL,
summaryFunction = NULL,
precomputed_predictions = NULL,
multiclass_auc = "macro"
)
Arguments
model_obj |
A fitted model or a tuning result ('tune_results' object). |
model_id |
A character identifier for the model (used in warnings). |
task |
Type of task, either '"classification"', '"regression"', or '"survival"'. |
test_data |
A data frame containing the test data. |
label |
The name of the outcome variable (as a character string). |
event_class |
For binary classification, specifies which class is considered the positive class: '"first"' or '"second"'. |
class_threshold |
For binary classification, controls how class probabilities are converted into hard class predictions. Numeric values in (0, 1) set a fixed threshold. The default '"auto"' tunes a threshold on the training data to maximize F1; use ‘"model"' to keep the model’s default threshold. |
start_col |
Optional string. The name of the column specifying the
start time in counting process (e.g., '(start, stop, event)') survival
data. Only used when |
time_col |
String. The name of the column specifying the event or
censoring time (the "stop" time in counting process data). Only used
when |
status_col |
String. The name of the column specifying the event
status (e.g., 0 for censored, 1 for event). Only used when
|
engine |
A character string indicating the model engine (e.g., '"xgboost"', '"randomForest"'). Used to determine if class probabilities are supported. If 'NULL', probabilities are skipped. |
train_data |
A data frame containing the training data, required to refit finalized workflows. |
metric |
The name of the metric (e.g., '"roc_auc"', '"accuracy"', '"rmse"') used for selecting the best tuning result. |
eval_times_user |
Optional numeric vector of time horizons at which to evaluate survival Brier scores. When 'NULL', sensible defaults based on the observed follow-up distribution are used. |
bootstrap_ci |
Logical; if 'TRUE', bootstrap confidence intervals are estimated for performance metrics. |
bootstrap_samples |
Integer giving the number of bootstrap resamples used when computing confidence intervals. |
bootstrap_seed |
Optional integer seed applied before bootstrap resampling to make interval estimates reproducible. |
at_risk_threshold |
Numeric value between 0 and 1 defining the minimum proportion of subjects required to remain at risk when determining the maximum follow-up time used in survival metrics. |
survival_metric_convention |
Character string specifying which survival metric conventions to follow. ‘"fastml"' (default) uses fastml’s internal defaults for evaluation horizons and t_max. '"tidymodels"' uses 'eval_times_user' as the explicit evaluation grid and applies yardstick-style Brier/IBS normalization; when 'eval_times_user' is 'NULL', time-dependent Brier metrics are omitted. |
metrics |
Optional yardstick metric set (e.g., 'yardstick::metric_set(yardstick::rmse)') used for computing regression performance. |
summaryFunction |
Optional custom classification metric function passed to 'yardstick::new_class_metric()' and included in holdout evaluation. |
precomputed_predictions |
Optional data frame or nested list of previously generated predictions (per algorithm/engine) to reuse instead of re-predicting; primarily used when combining results across engines. |
multiclass_auc |
For multiclass ROC AUC, the averaging method to use: '"macro"' (default, tidymodels) or '"macro_weighted"'. Macro weights each class equally, while macro_weighted weights by class prevalence and can change model rankings on imbalanced data. |
Details
- If the input 'model_obj' is a 'tune_results' object, the function finalizes the model using the best hyperparameters according to the specified 'metric', and refits the model on the full training data.
- For classification tasks, performance metrics include accuracy, kappa, sensitivity, specificity, precision, F1-score, and ROC AUC (if probabilities are available).
- For multiclass ROC AUC, the estimator is controlled by 'multiclass_auc'.
- For regression tasks, RMSE, R-squared, and MAE are returned.
- For models with missing prediction lengths, a helpful imputation error is thrown to guide data preprocessing.
Value
A list with two elements:
- performance
A tibble with computed performance metrics.
- predictions
A tibble with predicted values and corresponding truth values, and probabilities (if applicable).
Recommend Tuning Configuration
Description
Provides recommendations for tuning configuration based on dataset characteristics and time constraints.
Usage
recommend_tuning_config(
n_rows,
n_predictors,
n_algorithms = 1,
max_time_minutes = 30,
tuning_strategy = "grid"
)
Arguments
n_rows |
Number of rows in training data. |
n_predictors |
Number of predictor variables. |
n_algorithms |
Number of algorithms to tune. |
max_time_minutes |
Maximum acceptable tuning time in minutes. |
tuning_strategy |
Preferred tuning strategy. |
Value
A list with recommended configuration.
Reset Default Override Warnings
Description
Resets the tracking of which default override warnings have been shown. Useful for testing or when starting a new analysis session.
Usage
reset_default_warnings()
Resolve the positive class for binary classification
Description
Determines the positive class respecting event_class settings from fastml(). This ensures consistency across all explainer functions.
Usage
resolve_positive_class(prep, y_levels)
Arguments
prep |
A list from fastml_prepare_explainer_inputs containing: positive_class, event_class, label_levels |
y_levels |
Character vector of actual levels from the target variable |
Value
Character string of the resolved positive class name
Clean Column Names or Character Vectors by Removing Special Characters
Description
This function can operate on either a data frame or a character vector:
-
Data frame: Detects columns whose names contain any character that is not a letter, number, or underscore, removes colons, replaces slashes with underscores, and spaces with underscores.
-
Character vector: Applies the same cleaning rules to every element of the vector.
Usage
sanitize(x)
Arguments
x |
A data frame or character vector to be cleaned. |
Value
If
xis a data frame: returns a data frame with cleaned column names.If
xis a character vector: returns a character vector with cleaned elements.
Save Model Function
Description
Saves the trained model object to a file.
Usage
save.fastml(model, filepath)
Arguments
model |
An object of class |
filepath |
A string specifying the file path to save the model. |
Value
No return value, called for its side effect of saving the model object to a file.
Summary Function for fastml (Using yardstick for ROC Curves)
Description
Summarizes the results of machine learning models trained using the 'fastml' package. Depending on the task type (classification or regression), it provides customized output such as performance metrics, best hyperparameter settings, and confusion matrices. It is designed to be informative and readable, helping users quickly interpret model results.
Usage
## S3 method for class 'fastml'
summary(
object,
algorithm = "best",
type = c("all", "metrics", "params", "conf_mat"),
sort_metric = NULL,
show_ci = FALSE,
brier_times = NULL,
...
)
Arguments
object |
An object of class |
algorithm |
A vector of algorithm names to display summary. Default is |
type |
Character vector indicating which outputs to produce.
Options are |
sort_metric |
The metric to sort by. Default uses optimized metric. |
show_ci |
Logical indicating whether to display 95% confidence intervals
for performance metrics in survival models. Defaults to |
brier_times |
Optional numeric or character vector that selects which
time-specific Brier scores to display for survival models. When |
... |
Additional arguments. |
Details
For classification tasks, the summary includes metrics such as Accuracy, F1 Score, Kappa, Precision, ROC AUC, Sensitivity, and Specificity. A confusion matrix is also provided for the best model(s). For regression tasks, the summary reports RMSE, R-squared, and MAE.
Users can control the type of output with the 'type' argument: 'metrics' displays model performance metrics. 'params' shows the best hyperparameter settings. 'conf_mat' prints confusion matrices (only for classification). 'all' includes all of the above.
If multiple algorithms are trained, the summary highlights the best model based on the optimized metric.
For survival tasks, Harrell's C-index, Uno's C-index, the integrated Brier
score, and (when available) the RMST difference are shown by default. Specific
Brier(t) horizons can be requested through the brier_times argument.
Value
Prints summary of fastml models.
Fit a surrogate decision tree for a fastml model
Description
Builds an interpretable tree approximating the behaviour of the underlying model using the 'iml' package.
Usage
surrogate_tree(object, maxdepth = 3, data = c("train", "test"), ...)
Arguments
object |
A 'fastml' object. |
maxdepth |
Maximum depth of the surrogate tree. Default 3. |
data |
Character string specifying which data to use: |
... |
Additional arguments passed to 'iml::TreeSurrogate'. |
Value
An 'iml::TreeSurrogate' object.
Examples
## Not run:
data(iris)
iris <- iris[iris$Species != "setosa", ]
iris$Species <- factor(iris$Species)
model <- fastml(data = iris, label = "Species")
surrogate_tree(model)
## End(Not run)
Train Specified Machine Learning Algorithms on the Training Data
Description
Trains specified machine learning algorithms on the preprocessed training data.
Usage
train_models(
train_data,
label,
task,
algorithms,
resampling_method,
folds,
repeats,
group_cols = NULL,
block_col = NULL,
block_size = NULL,
initial_window = NULL,
assess_window = NULL,
skip = 0,
outer_folds = NULL,
resamples = NULL,
tune_params,
engine_params = list(),
metric,
summaryFunction = NULL,
seed = 123,
recipe,
use_default_tuning = FALSE,
tuning_strategy = "grid",
tuning_iterations = 10,
tuning_complexity = "balanced",
grid_levels = 3L,
early_stopping = FALSE,
adaptive = FALSE,
algorithm_engines = NULL,
use_parsnip_defaults = FALSE,
warn_engine_defaults = TRUE,
n_cores = 1,
verbose = FALSE,
event_class = "first",
class_threshold = "auto",
start_col = NULL,
time_col = NULL,
status_col = NULL,
eval_times = NULL,
at_risk_threshold = 0.1,
survival_metric_convention = "fastml",
audit_env = NULL,
multiclass_auc = "macro",
store_fold_models = FALSE
)
Arguments
train_data |
Preprocessed training data frame. |
label |
Name of the target variable. |
task |
Type of task: "classification", "regression", or "survival". |
algorithms |
Vector of algorithm names to train. |
resampling_method |
Resampling method for cross-validation. Supported
options include standard |
folds |
Number of folds for cross-validation. |
repeats |
Number of times to repeat cross-validation (only applicable for methods like "repeatedcv"). |
group_cols |
Optional character vector of grouping columns used with 'resampling_method = "grouped_cv"'. For classification problems the outcome column is used to request grouped stratification where supported; if class imbalance prevents stratification, grouped folds are still created and a warning is emitted to document the limitation. |
block_col |
Optional name of the ordering column used with blocked or rolling resampling. |
block_size |
Optional integer specifying the block size for 'resampling_method = "blocked_cv"'. |
initial_window |
Optional integer specifying the initial window size for rolling resampling. |
assess_window |
Optional integer specifying the assessment window size for rolling resampling. |
skip |
Optional integer number of resamples to skip between rolling resamples. |
outer_folds |
Optional integer specifying the number of outer folds for 'resampling_method = "nested_cv"'. |
resamples |
Optional rsample object. If provided, custom resampling splits will be used instead of those created internally. |
tune_params |
A named list of tuning ranges. For each algorithm, supply a
list of engine-specific parameter values, e.g.
|
engine_params |
A named list of fixed engine-level arguments passed
directly to the model fitting call for each algorithm/engine combination.
Use this to control options like |
metric |
The performance metric to optimize. For classification, options
include |
summaryFunction |
A custom summary function for model evaluation. Default is |
seed |
An integer value specifying the random seed for reproducibility. |
recipe |
A recipe object for preprocessing. |
use_default_tuning |
Logical; if |
tuning_strategy |
A string specifying the tuning strategy. Must be one of
|
tuning_iterations |
Number of iterations for Bayesian tuning. Ignored
when |
tuning_complexity |
Character string specifying tuning complexity preset. One of "quick", "balanced", "thorough", or "exhaustive". Controls both grid density and parameter range width. |
grid_levels |
Integer specifying number of levels per parameter for grid search. Higher values create denser grids but increase computation exponentially (grid size = levels^n_params). |
early_stopping |
Logical for early stopping in Bayesian tuning. |
adaptive |
Logical indicating whether to use adaptive/racing methods. |
algorithm_engines |
A named list specifying the engine to use for each algorithm. |
use_parsnip_defaults |
Logical. If |
warn_engine_defaults |
Logical. If |
n_cores |
Integer number of cores requested for parallel processing. Used to decide whether tuning/resampling should run in parallel and to configure engine thread settings when supported. |
verbose |
Logical. If |
event_class |
Character string identifying the positive class when computing classification metrics ("first" or "second"). |
class_threshold |
For binary classification, controls how class probabilities are converted into hard class predictions during evaluation. Numeric values in (0, 1) set a fixed threshold. The default '"auto"' tunes a threshold on the training data to maximize F1; use ‘"model"' to keep the model’s default threshold. |
start_col |
Optional name of the survival start time column passed through to downstream evaluation helpers. |
time_col |
Optional name of the survival stop time column. |
status_col |
Optional name of the survival status/event column. |
eval_times |
Optional numeric vector of time horizons for survival metrics. |
at_risk_threshold |
Numeric cutoff used to determine the evaluation window for survival metrics within guarded resampling. |
survival_metric_convention |
Character string specifying which survival metric conventions to follow. ‘"fastml"' (default) uses fastml’s internal defaults for evaluation horizons and t_max. '"tidymodels"' uses 'eval_times' as the explicit evaluation grid and applies yardstick-style Brier/IBS normalization; when 'eval_times' is 'NULL', time-dependent Brier metrics are omitted. |
audit_env |
Internal environment that tracks security audit findings when
custom preprocessing hooks are executed. Typically supplied by
|
multiclass_auc |
For multiclass ROC AUC, the averaging method to use: '"macro"' (default, tidymodels) or '"macro_weighted"'. Macro weights each class equally, while macro_weighted weights by class prevalence and can change model rankings on imbalanced data. |
store_fold_models |
Logical. If |
Value
A list of trained model objects.
Tuning Configuration and Complexity Presets
Description
Functions and presets for configuring hyperparameter tuning grids with explicit speed-robustness trade-offs.
Validate Defaults Registry Against Parsnip
Description
Compares the hardcoded parsnip default engines in fastml's registry against
the actual defaults reported by parsnip::show_engines(). Returns a
list of any mismatches found, which may indicate that parsnip has updated
its defaults since fastml's registry was last updated.
Usage
validate_defaults_registry()
Details
This function queries parsnip for model specifications and compares
against the hardcoded parsnip_defaults list in
get_parsnip_default_engine(). Mismatches may occur when:
Parsnip updates its default engine for a model type
New engines are added to parsnip that become the new default
fastml's registry has not been updated after a parsnip release
This validation is intended for package maintenance and testing purposes.
Value
A list of mismatches. Each element is a list with components:
- algorithm
The algorithm name.
- fastml_default
The default engine recorded in fastml's registry.
- parsnip_default
The actual default engine from parsnip.
Returns an empty list if no mismatches are found.
Examples
## Not run:
mismatches <- validate_defaults_registry()
if (length(mismatches) > 0) {
message("Found ", length(mismatches), " mismatch(es) with parsnip defaults")
}
## End(Not run)
Warn About Default Overrides
Description
Issues a warning if fastml defaults differ from parsnip defaults and the warning hasn't been shown yet in this session.
Usage
warn_default_override(
algo,
task,
fastml_engine,
fastml_params = NULL,
verbose = FALSE,
warn_once = TRUE
)
Arguments
algo |
Algorithm name. |
task |
Task type. |
fastml_engine |
fastml's default engine for this algorithm. |
fastml_params |
fastml's default parameters (optional). |
verbose |
If TRUE, always show the message (as a message, not warning). |
warn_once |
If TRUE (default), only warn once per algorithm per session. |
Value
Invisibly returns the comparison result.