Part 9 Extension to unsupervised learning
This interface could also be extended to unsupervised learning following the spirit of Scikit-Learn
by replacing the predict
method with transform
in the unsupervised case.
Model families make sense in the unsupervised domain. For example, k-means and convex clustering both involve some sort of hyperparameter selection.
- For convex clustering you want to select a penalization parameter, probably optimizing one of several proposed clustering statistics
- For k-means you want to take the mode of cluster assignments, so
transform(k_means)
is probably a bad idea, buttransform(k_means_family)
would be interesting.
9.1 Invertible Mappings
I want a standard S3 method for invertible mappings in the spirit of scale
/unscale
. In some sense recipes does this, but only for the forward transformation. scale
/unscale
store information about the transformation in the returned object, which I’m not sold on. I’d prefer the following:
pca_model <- pca(X) # pca(X) wraps fit(new_pca(), X)
Z <- transform(pca_model, X) # transform performs forward mapping
X_recovered <- untransform(pca_model, Z) # untransform performs inverse mapping
and then you could provide the standard wrappers like fit_transform
, fit_untransform
, etc.
9.2 Unsupervised Transformations That Are Not Maps
It’s worth thinking about unsupervised techniques like t-SNE because you can transform data, but you can’t ever fit a (non-parametric) t-SNE model because t-SNE doesn’t define a function mapping data to a new space.
Offer a transform
method for these but no fit
?