Teófilo R. F., Martins J. P., Ferreira M. M. C., "Sorting
variables by using informative vectors as a strategy for feature selection
in multivariate regression", J. Chemometr., 23(1), 32-48
(Jan 2009).
[Article]
Abstract.
A new procedure with high ability to enhance prediction of multivariate
calibration models with a small number of interpretable variables is presented.
The core of this methodology is to sort the variables from an informative
vector, followed by a systematic investigation of PLS regression models
with the aim of finding the most relevant set of variables by comparing
the cross-validation parameters of the models obtained. In this work, seven
main informative vectors i.e. regression vector, correlation vector, residual
vector, variable influence on projection (VIP), net analyte signal (NAS),
covariance procedures vector (CovProc), signal-to-noise ratios vector (StN)
and their combinations were automated and tested with the main purpose
of feature selection. Six data sets from different sources were employed
to validate this methodology. They originated from: near-Infrared (NIR)
spectroscopy, Raman spectroscopy, gas chromatography (GC), fluorescence
spectroscopy, quantitative structure-activity relationships (QSAR) and
computer simulation. The results indicate that all vectors and their combinations
were able to enhance prediction capability with respect to the full data
sets. However, regression and NAS informative vectors from partial least
squares (PLS) regression, both built using more latent variables than when
building the model presented in most of tested data sets, were the best
informative vectors for variable selection. In all the applications, the
selected variables were quite effective and useful for interpretation.
Keywords.
Variable Selection; Informative Vectors; OPS; Partial Least Squares;
Chemometrics.
Keywords Plus.