98.

Teófilo R. F., Ataide J. P., Ferreira M. M. C., "Study of the computational performance of PLS algorithms using experimental design".  Lappeenranta, Finland, 11-15/07/2007: 10th Scandinavian Symposium on Chemometrics (SSC10), Book of Abstracts, (2007) 97. Poster PO35.


10th Scandinavian Symposium on ChemometricsPO 35

Study   of   the   computational   performance   of   PLS   algorithms   using
experimental design

Reinaldo Francisco Teófilo, João Paulo Ataide Martins, Márcia Miguel Castro Ferreira
Laboratório    de    Quimiometria    Teórica    e    Aplicada    -    Instituto   de   Química   -   Universidade
Estadual de Campinas
_____________________________________________________________________________________

1. Introduction
Among the multivariate  calibration  methods,   Partial Least Squares (PLS)  regression have been  chosen
due to relevant advantages  over  other methods.   In  distinct applications of multivariate calibration,  data
matrices  can  be  very  large,   e.g.  in  3D-QSAR,  data mining,  near  infrared spectroscopy  (NIR),  and
calculation time is a factor which cannot be neglected.  Thus,  a  fast  PLS  algorithm  is very important in
this situation, since time can be saved during model building.

The purpose of this work is  to compare,  using  experimental design, five PLS algorithms available  in  the
literature  with  respect to  their  computational time when doing leave-one-out cross-validation.  They  are:
classical   NIPALS  (NIPALS)  [1],  modified  NIPALS  (NIPALSy)  [2],  Kernel  (Kernel)  [3],  SIMPLS
(SIMPLS) [4],  the  bidiagonalization algorithm  (PLSBi)  [5].  Different  dimensions  matrices were tested
aiming to show which algorithm is the best in each situation.

2. Experimental
Two full factorial designs, 23,  with center point in triplicate were carried out considering two types of  data
matrix X:  small  (SX)  and  large  (LX).  The  response  investigated  in  the  experimental design  was  the
running time of the algorithm during  cross-validation.  Table  1  shows  the variables  investigated  and  the
levels studied for each data set.

The  independent  variables  X  and  dependent  y  were  generated  with   random  numbers  and   matrix
dimensions  as  described  in  Table 1.  The   same   leave-one-out  cross-validation   was  applied  for  all
algorithms with the number of latent variables defined in Table 1.

3. Results and discussion
It was observed that row and column numbers are, as expected, the major factors influencing the response
time.   Except  for  Kernel,  the columns  were  more  important that rows  for  SX  and  the opposite  was
observed  for  LX.  The nLV  variable showed to  be of little significance in running time  compared  to  X
dimensions.

It is  possible  to  conclude  that  between  PLSBi  and  SIMPLS  a  small  differnce  in  performance  was
observed but  they  were  more efficient  than  the  others,   mainly  for large data sets.   Among  NIPALS,
NIPLALSy and Kernel,  the  performance in descending order was: NIPALSy,  NIPALS and Kernel.  It is
important to emphasize that all algorithms present the same prediction results for all data set tested.
_____________________________________________________________________________________

References
[1] Haaland, D. M. & Thomas, E. V., Anal. Chem. 60, 1988, 1193-1202.
[2] Dayal, B. S. & MacGregor, J. F., J. Chemometr. 11, 1997, 73-85.
[3] Lindgren, F., Geladi, P. & Wold, S., J. Chemometr. 7, 1993, 45-59.
[4] de Jong, S., Chemometrics Intell. Lab. Syst. 18, 1993, 251-263.
[5] Wu, W. & Manne, R., Chemometrics Intell. Lab. Syst. 51, 2000, 145-161.
 

97