Skip to content
Prev 58521 / 398502 Next

highly biased PCA data?

Dan:


1) There is no guarantee that PCA will show separate groups, of course, as
that is not its purpose, although it is frequently a side effect.

2) If you were to use a classification method of some sort (discriminant
analysis, neural nets, SVM's, model=based classification,  ...), my
understanding is that yes, indeed, severely unbalanced group membership
would, indeed, affect results. A guess is that Bayesian or other methods
that could explicitly model the prior membership probabilities would do
better. To make it clear why, suppose that there was a 99.9% preference of
"dog" and .05% each of the others. Than your datasets would have almost no
information on how covariates could distinguish the classes and the best
classifier would be to call everything a "dog" no matter what values the
covariates had.

I presume experts will have more and better to say about this.

-- Bert Gunter
Genentech Non-Clinical Statistics
South San Francisco, CA
 
"The business of the statistician is to catalyze the scientific learning
process."  - George E. P. Box