pls -- crossval vs plsr(..., CV=TRUE) - R-help

Wed, May 11, 2005 7:42 PM #

An embedded and charset-unspecified text was scrubbed...
Name: not available
Url: https://stat.ethz.ch/pipermail/r-help/attachments/20050511/1cdb9230/attachment.pl

Bjørn-Helge Mevik

Thu, May 12, 2005 6:34 AM #

martin peters writes:

[...]

There are two reasons:

1) The call plsr(y ~ X, 6, data = NIR, CV=TRUE, method="kernelpls") is
   incorrect.  The `CV' argument of the superseded `pls.pcr' package
   has been replaced by the `validation' argument, so the correct call
   would be
   NIR.plsCV <- plsr(y ~ X, 6, data = NIR, validation="CV", method="kernelpls")
   (If you had done R2(testing.plsNOCV), you would have gotten exactly
   the same as with the R2(NIR.plsCV) above.)

2) plsr(... , validation = "CV") and crossval(...) both by default use
   CV with 10-fold _randomly selected_ segments, which means that each
   time you run the cross-validation, you will get slightly different
   results.  (Try running R2(crossval(testing.plsNOCV)) a couple of times.)

   If you want the same segments in two separate calls, either add the
   argument segment.type = "consecutive" or "interleaved", or specify
   the segments explicitly with the `segments' argument (see
   ?crossval or ?mvrCv for how).

   The segments actually used in a cross-validation is stored in the
   $validation$segments component of the object,
   i.e. testing.plsCV$validation$segments.

(By the way, `method = "kernelpls"' is not needed, as it is the
default fit method for plsr (and mvr).)

Bj??rn-Helge Mevik

Bjørn-Helge Mevik

Fri, May 13, 2005 1:20 AM #

Bj??rn-Helge Mevik writes:

Actually, there is a third reason as well. :-)  We've just discovered an
embarrasingly stupid bug in the R2 calculation in mvrCv; it calculates
R (the correlation) instead of R2 (squared correlation).  A patched
version will be released shortly.  Until then, c(R2(NIR.plsCV)$val^2)
should give you the correct values.

Bj??rn-Helge Mevik