Safely predict from a model object

safe_predict(object, new_data, type = NULL, ..., level = 0.95,
  std_error = FALSE)

Arguments

object

An object or model you would like to get predictions from.

new_data

Required. Data in the same format as required for the predict() (or relevant method) for object. We do our best to support missing data specified via NAs in new_data although this is somewhat dependent on the underlying predict() method.

new_data:

  • does not need to contain the model outcome

  • can contain additional columns not used for prediction

  • can have one or more rows when specified via a table-like object such as a tibble::tibble(), data.frame() or matrix().

type

A character vector indicating what kind of predictions you would like.

Options are:

  • "response": continuous/numeric predictions

  • "class": hard class predictions

  • "prob": class or survival probabilities

  • "link": predictors on the linear scale (GLMs only)

  • "conf_int": confidence intervals for means of continuous predictions

  • "pred_int": prediction intervals for continuous outcomes

In most cases, only a subset of these options are available.

...

Unused. safe_predict() checks that all arguments in ... are evaluated via the ellipsis package. The idea is to prevent silent errors when arguments are mispelled. This feature is experimental and feedback is welcome.

level

A number strictly between 0 and 1 to use as the confidence level when calculating confidence and prediction intervals. Setting level = 0.90 correspondings to a 90 percent confidence interval. Ignored except when type = "conf_int" or type = "pred_int". Defaults to 0.95.

std_error

Logical indicating whether or not calculate standard errors for the fit at each point. Not available for all models, and can be computationally expensive to compute. The standard error is always the standard error for the mean, and never the standard error for predictions. Standard errors are returned in a column called .std_error. Defaults to FALSE.

Value

A tibble::tibble() with one row for each row of new_data. Predictions for observations with missing data will be NA. Returned tibble has different columns depending on type:

  • "response":

    • univariate outcome: .pred (numeric)

    • multivariate outcomes: .pred_{outcome name} (numeric) for each outcome

  • "class": .pred_class (factor)

  • "prob": .pred_{level} columns (numerics between 0 and 1)

  • "link": .pred (numeric)

  • "conf_int": .pred, .pred_lower, .pred_upper (all numeric)

  • "pred_int": .pred, .pred_lower, .pred_upper (all numeric)

If you request standard errors with std_error = TRUE, an additional column .std_error.

For interval predictions, the tibble has additional attributes level and interval. The level is the same as the level argument and is between 0 and 1. interval is either "confidence" or "prediction". Some models may also set a method attribute to detail the method used to calculate the intervals.

Confidence vs prediction intervals

For details on the difference between confidence and prediction intervals, see the online documentation. This is also available as a vignette that you can access with:

vignette("intervals", package = "safepredict")

Factors and novel factor levels

We recommend using recipes::recipe()s to consistently handle categorical (factor) predictors. For details see:

vignette("novel-factor-levels", package = "safepredict")

Currently we do not have a robust way to check for novel factor levels from within safepredict. In practice this would require storing information about predictors in the model object at fitting time, and safepredict is largely at the mercy of package writers in that regard. Using recipes to preprocess your data should take care of the issue.

The goal of safepredict is to make prediction as painless and consistent as possible across a wide variety of model objects. In some cases, the existing intrafrastructure is insufficient to provide a consistent and feature-rich prediction interface. As a result, we support a number of model objects that we do not actually recommend using. In these cases, we try to link to better and more feature-rich implementations.