Tune an elastic net model's hyperparameters over multiple resamples
Source:R/modeling_helpers.R
tof_tune_glmnet.Rd
Tune an elastic net model's hyperparameters over multiple resamples
Usage
tof_tune_glmnet(
split_data,
prepped_recipe,
hyperparameter_grid,
model_type,
outcome_cols,
optimization_metric = "tidytof_default",
num_cores = 1
)
Arguments
- split_data
An `rsplit` or `rset` object from the
rsample
package. The easiest way to generate this is to usetof_split_data
. Alternatively, an unsplit tbl_df can be provided, though this is not recommended.- prepped_recipe
Either a single
recipe
object (if `split_data` is an `rsplit` object or a `tbl_df`) or list of recipes (if `split_data` is an `rset` object) such that each entry in the list corresponds to a resample in `split_data`.- hyperparameter_grid
A hyperparameter grid indicating which values of the elastic net penalty (lambda) and the elastic net mixture (alpha) hyperparameters should be used during model tuning. Generate this grid using
tof_create_grid
.- model_type
A string indicating which kind of elastic net model to build. If a continuous response is being predicted, use "linear" for linear regression; if a categorical response with only 2 classes is being predicted, use "two-class" for logistic regression; if a categorical response with more than 2 levels is being predicted, use "multiclass" for multinomial regression; and if a time-to-event outcome is being predicted, use "survival" for Cox regression.
- outcome_cols
Unquoted column name(s) indicating which column(s) in the data contained in `split_data` should be used as the outcome in the elastic net model. For survival models, two columns should be selected; for all others, only one column should be selected.
- optimization_metric
A string indicating which optimization metric should be used for hyperparameter selection during model tuning. Valid values depend on the model_type.
- num_cores
Integer indicating how many cores should be used for parallel processing when fitting multiple models. Defaults to 1. Overhead to separate models across multiple cores can be high, so significant speedup is unlikely to be observed unless many large models are being fit.
Value
A tibble containing a summary of the model's performance in each
resampling iteration across all hyperparameter combinations. Will contain
3 columns: "splits" (a list-col containing each resampling iteration's
`rsplit` object), "id" (the name of the resampling iteration), and
"performance_metrics" (a list-col containing the performance metrics for each
resampling iteration. Each row of "performance_metrics" is a tibble with
the columns "mixture" and "penalty" and several additional columns containing the
performance metrics of the model for each mixture/penalty combination).
See tof_fit_split
for additional details.