Skip to contents

This function trains a glmnet model on the training set of an rsplit object, then calculates performance metrics of that model on the validation/holdout set at all combinations of the mixture and penalty hyperparameters provided in a hyperparameter grid.

Usage

tof_fit_split(
  split_data,
  prepped_recipe,
  hyperparameter_grid,
  model_type,
  outcome_colnames
)

Arguments

split_data

An `rsplit` object from the rsample package. Alternatively, an unsplit tbl_df can be provided, though this is not recommended.

prepped_recipe

A trained recipe

hyperparameter_grid

A tibble containing the hyperparameter values to tune. Can be created using tof_create_grid

model_type

A string representing the type of glmnet model being fit.

outcome_colnames

Quoted column names indicating which columns in the data being fit represent the outcome variables (with all others assumed to be predictors).

Value

A tibble with the same number of rows as the input hyperparameter grid. Each row represents a combination of mixture and penalty, and each column contains a performance metric for the fitted glmnet model on `split_data`'s holdout set. The specific performance metrics depend on the type of model being fit:

"linear"

mean-squared error (`mse`) and mean absolute error (`mae`)

"two-class"

binomial deviance (`binomial_deviance`); misclassification error rate `misclassification_error`; the area under the receiver-operating curve (`roc_auc`); and `mse` and `mse` as above

"multiclass"

multinomial deviance (`multinomial_deviance`); misclassification error rate `misclassification_error`; the area under the receiver-operating curve (`roc_auc`) computed using the Hand-Till method in roc_auc; and `mse` and `mse` as above

"survival"

the negative log2-transformed partial likelihood (`neg_log_partial_likelihood`) and Harrel's concordance index (often simply called "C"; `concordance_index`)

References

Harrel Jr, F. E. and Lee, K. L. and Mark, D. B. (1996) Tutorial in biostatistics: multivariable prognostic models: issues in developing models, evaluating assumptions and adequacy, and measuring and reducing error, Statistics in Medicine, 15, pages 361–387.