Safely predict from a model object

safe_predict(object, new_data, type = NULL, ..., level = 0.95,
  std_error = FALSE)

Arguments

object	An object or model you would like to get predictions from.
new_data	Required. Data in the same format as required for the `predict()` (or relevant method) for `object`. We do our best to support missing data specified via `NA`s in `new_data` although this is somewhat dependent on the underlying `predict()` method. `new_data`: does not need to contain the model outcome can contain additional columns not used for prediction can have one or more rows when specified via a table-like object such as a `tibble::tibble()`, `data.frame()` or `matrix()`.
type	A character vector indicating what kind of predictions you would like. Options are: `"response"`: continuous/numeric predictions `"class"`: hard class predictions `"prob"`: class or survival probabilities `"link"`: predictors on the linear scale (GLMs only) `"conf_int"`: confidence intervals for means of continuous predictions `"pred_int"`: prediction intervals for continuous outcomes In most cases, only a subset of these options are available.
...	Unused. `safe_predict()` checks that all arguments in `...` are evaluated via the `ellipsis` package. The idea is to prevent silent errors when arguments are mispelled. This feature is experimental and feedback is welcome.
level	A number strictly between `0` and `1` to use as the confidence level when calculating confidence and prediction intervals. Setting `level = 0.90` correspondings to a 90 percent confidence interval. Ignored except when `type = "conf_int"` or `type = "pred_int"`. Defaults to `0.95`.
std_error	Logical indicating whether or not calculate standard errors for the fit at each point. Not available for all models, and can be computationally expensive to compute. The standard error is always the standard error for the mean, and never the standard error for predictions. Standard errors are returned in a column called `.std_error`. Defaults to `FALSE`.

Value

A tibble::tibble() with one row for each row of new_data. Predictions for observations with missing data will be NA. Returned tibble has different columns depending on type:

"response":
- univariate outcome: .pred (numeric)
- multivariate outcomes: .pred_{outcome name} (numeric) for each outcome
"class": .pred_class (factor)
"prob": .pred_{level} columns (numerics between 0 and 1)
"link": .pred (numeric)
"conf_int": .pred, .pred_lower, .pred_upper (all numeric)
"pred_int": .pred, .pred_lower, .pred_upper (all numeric)

If you request standard errors with std_error = TRUE, an additional column .std_error.

For interval predictions, the tibble has additional attributes level and interval. The level is the same as the level argument and is between 0 and 1. interval is either "confidence" or "prediction". Some models may also set a method attribute to detail the method used to calculate the intervals.

Confidence vs prediction intervals

For details on the difference between confidence and prediction intervals, see the online documentation. This is also available as a vignette that you can access with:

vignette("intervals", package = "safepredict")

Factors and novel factor levels

We recommend using recipes::recipe()s to consistently handle categorical (factor) predictors. For details see:

vignette("novel-factor-levels", package = "safepredict")

Currently we do not have a robust way to check for novel factor levels from within safepredict. In practice this would require storing information about predictors in the model object at fitting time, and safepredict is largely at the mercy of package writers in that regard. Using recipes to preprocess your data should take care of the issue.

Recommended implementations