Skip to contents

Tune an elastic net model's hyperparameters over multiple resamples

Usage

tof_tune_glmnet(
  split_data,
  prepped_recipe,
  hyperparameter_grid,
  model_type,
  outcome_cols,
  optimization_metric = "tidytof_default",
  num_cores = 1
)

Arguments

split_data

An `rsplit` or `rset` object from the rsample package. The easiest way to generate this is to use tof_split_data. Alternatively, an unsplit tbl_df can be provided, though this is not recommended.

prepped_recipe

Either a single recipe object (if `split_data` is an `rsplit` object or a `tbl_df`) or list of recipes (if `split_data` is an `rset` object) such that each entry in the list corresponds to a resample in `split_data`.

hyperparameter_grid

A hyperparameter grid indicating which values of the elastic net penalty (lambda) and the elastic net mixture (alpha) hyperparameters should be used during model tuning. Generate this grid using tof_create_grid.

model_type

A string indicating which kind of elastic net model to build. If a continuous response is being predicted, use "linear" for linear regression; if a categorical response with only 2 classes is being predicted, use "two-class" for logistic regression; if a categorical response with more than 2 levels is being predicted, use "multiclass" for multinomial regression; and if a time-to-event outcome is being predicted, use "survival" for Cox regression.

outcome_cols

Unquoted column name(s) indicating which column(s) in the data contained in `split_data` should be used as the outcome in the elastic net model. For survival models, two columns should be selected; for all others, only one column should be selected.

optimization_metric

A string indicating which optimization metric should be used for hyperparameter selection during model tuning. Valid values depend on the model_type.

num_cores

Integer indicating how many cores should be used for parallel processing when fitting multiple models. Defaults to 1. Overhead to separate models across multiple cores can be high, so significant speedup is unlikely to be observed unless many large models are being fit.

Value

A tibble containing a summary of the model's performance in each resampling iteration across all hyperparameter combinations. Will contain 3 columns: "splits" (a list-col containing each resampling iteration's `rsplit` object), "id" (the name of the resampling iteration), and "performance_metrics" (a list-col containing the performance metrics for each resampling iteration. Each row of "performance_metrics" is a tibble with the columns "mixture" and "penalty" and several additional columns containing the performance metrics of the model for each mixture/penalty combination). See tof_fit_split for additional details.