Chapter 5 Data Specification
formulas, model.frame, term objects, etc
data / design matrix specification -
recipes
habit: get the df right, then y ~ . in the formula. would be nice to still see the features in the call?
- ask users to use data.frames and tibbles, not matrices.
5.1 Formulas
5.1.1 Testing formulas
https://github.com/alexpghayes/formulize/blob/master/tests/testthat/test_formula.R
minimum set of formula tests (based on mtcars example dataset):
- using
as.factor()
inlinempg ~ as.factor(hp)
- using
as.character()
inlinempg ~ as.character(hp)
- intercept only
mpg ~ 1
- no intercept
mpg ~ disp + hp + drat - 1
- implicit intercept
mpg ~ disp + hp + drat
- explicit intercept
mpg ~ disp + hp + drat + 1
- polynomials with 1 term
mpg ~ disp + hp + poly(drat, 1)
- polynomials with multiple terms
mpg ~ disp + hp + poly(drat, 3)
- natural splines with 1 term
mpg ~ disp + hp + ns(drat, 1)
- natural splines with multiple terms
mpg ~ disp + hp + ns(drat, 3)
- explicit interactios
mpg ~ drat + hp + drat:hp
- dot
mpg ~ .
- star
mpg ~ hp * drat
- as.is
mpg ~ hp + I(drat^2)
- multiple response cbind
- multiple responses as matrix
y ~ x
wherey
is a matrix - multiple predictors as matrix
y ~ x
wherex
is a matrix - multiple predictos and responses together
y ~ x
, bothx
,y
matrices - transformed response
log(mpg) ~ hp + drat
optional / to: - survival::Surv
and survival::strata
objects