highly biased PCA data?
Dan: 1) There is no guarantee that PCA will show separate groups, of course, as that is not its purpose, although it is frequently a side effect. 2) If you were to use a classification method of some sort (discriminant analysis, neural nets, SVM's, model=based classification, ...), my understanding is that yes, indeed, severely unbalanced group membership would, indeed, affect results. A guess is that Bayesian or other methods that could explicitly model the prior membership probabilities would do better. To make it clear why, suppose that there was a 99.9% preference of "dog" and .05% each of the others. Than your datasets would have almost no information on how covariates could distinguish the classes and the best classifier would be to call everything a "dog" no matter what values the covariates had. I presume experts will have more and better to say about this. -- Bert Gunter Genentech Non-Clinical Statistics South San Francisco, CA "The business of the statistician is to catalyze the scientific learning process." - George E. P. Box
-----Original Message----- From: r-help-bounces at stat.math.ethz.ch [mailto:r-help-bounces at stat.math.ethz.ch] On Behalf Of Dan Bolser Sent: Thursday, November 04, 2004 9:41 AM To: R mailing list Subject: [R] highly biased PCA data? Hello, supposing that I have two or three clear categories for my data, lets say pet preferece across fish, cat, dog. Lets say most people rate their preference as being mostly one of the categories. I want to do pca on the data to see three 'groups' of people, one group for fish, one for cat and one for dog. I would like to see the odd person who likes both or all three in the (appropriate) middle of the other main groups. Will my data be affected by the fact that I have interviewed 1000 dog owners, 100 cat owners and 10 fish owners? (assuming that each scale of preference has an equal range). Cheers, dan.
______________________________________________ R-help at stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html