Partial Least Squares with SAS PROC PLS

Partial Least Squares

Partial Least Squares (PLS) regression is a recent technique that generalizes and combines features from principal component analysis and multiple regression. It is particularly useful when we need to predict a set of dependent variables from a large set of independent variables (i.e., predictors). PROC PLS is the procedure that implements this technique in SAS.

The PLS procedure fits models by using any one of a number of linear predictive methods, including partial least squares (PLS). Ordinary least squares regression, as implemented in sas/STAT procedures such as PROC GLM and PROC REG, has the single goal of minimizing sample response prediction error, seeking linear functions of the predictors that explain as much variation in each response as possible. The techniques implemented in the PLS procedure have the additional goal of accounting for variation in the predictors, under the assumption that directions in the predictor space that are well sampled should provide better prediction for new observations when the predictors are highly correlated. All of the techniques implemented in the PLS procedure work by extracting successive linear combinations of the predictors, called factors (also called components, latent vectors, or latent variables), which optimally address one or both of these two goals—explaining response variation and explaining predictor variation. In particular, the method of partial least squares balances the two objectives, seeking factors that explain both response and predictor variation.