Chapter 5 Data Specification
formulas, model.frame, term objects, etc
data / design matrix specification -
recipes
habit: get the df right, then y ~ . in the formula. would be nice to still see the features in the call?
- ask users to use data.frames and tibbles, not matrices.
5.1 Formulas
5.1.1 Testing formulas
https://github.com/alexpghayes/formulize/blob/master/tests/testthat/test_formula.R
minimum set of formula tests (based on mtcars example dataset):
- using
as.factor()inlinempg ~ as.factor(hp) - using
as.character()inlinempg ~ as.character(hp) - intercept only
mpg ~ 1 - no intercept
mpg ~ disp + hp + drat - 1 - implicit intercept
mpg ~ disp + hp + drat - explicit intercept
mpg ~ disp + hp + drat + 1 - polynomials with 1 term
mpg ~ disp + hp + poly(drat, 1) - polynomials with multiple terms
mpg ~ disp + hp + poly(drat, 3) - natural splines with 1 term
mpg ~ disp + hp + ns(drat, 1) - natural splines with multiple terms
mpg ~ disp + hp + ns(drat, 3) - explicit interactios
mpg ~ drat + hp + drat:hp - dot
mpg ~ . - star
mpg ~ hp * drat - as.is
mpg ~ hp + I(drat^2) - multiple response cbind
- multiple responses as matrix
y ~ xwhereyis a matrix - multiple predictors as matrix
y ~ xwherexis a matrix - multiple predictos and responses together
y ~ x, bothx,ymatrices - transformed response
log(mpg) ~ hp + drat
optional / to: - survival::Surv and survival::strata objects