Back to formatted view
Raw Message

Message-ID: <m0act63tyj.fsf@bar.nemo-project.org>
Date: 2004-11-25T14:05:08Z
From: Bjørn-Helge Mevik
Subject: LDA with previous PCA for dimensionality reduction
In-Reply-To: <Pine.LNX.4.51.0411241535040.19391@artemis.imbe.med.uni-erlangen.de> (Torsten Hothorn's message of "Wed, 24 Nov 2004 15:43:13 +0100 (CET)")

Torsten Hothorn writes:

> as long as one does not use the information in the response (the class
> variable, in this case) I don't think that one ends up with an
> optimistically biased estimate of the error

I would be a little careful, though.  The left-out sample in the
LDA-cross-validation, will still have influenced the PCA used to build
the LDA on the rest of the samples.  The sample will have a tendency
to lie closer to the centre of the "complete" PCA than of a PCA on the
remaining samples.  Also, if the sample has a high leverage on the
PCA, the directions of the two PCAs can be quite different.  Thus, the
LDA is built on data that "fits" better to the left-out sample than if
the sample was a completely new sample.

I have no proofs or numerical studies showing that this gives
over-optimistic error rates, but I would not recommend placing the PCA
"outside" the cross-validation.  (The same for any resampling-based
validation.)

-- 
Bj??rn-Helge Mevik