Question about PCA with prcomp
To all ..., Bill's "lateral" wisdom is almost certainly a better solution. So thanks for the advice (and everything else that went before it [Bill: apropos of termplot, what happened to tplot ?]). And I will [almost] desist from asking the obvious: and if there were 10 000 observations ? BestR, Mark.
Bill.Venables wrote:
...but with 500 variables and only 20 'entities' (observations) you will have 481 PCs with dead zero eigenvalues. How small is 'smaller' and how many is "a few"? Everyone who has responded to this seems to accept the idea that PCA is the way to go here, but that is not clear to me at all. There is a 2-sample structure in the 20 observations that you have. If you simply ignore that in doing your PCA you are making strong assumptions about sampling that would seem to me unlikely to be met. If you allow for the structure and project orthogonal to it then you are probably throwing the baby out with the bathwater - you want to choose variables which maximise separation between the 2 samples (and now you are up to 482 zero principal variances, if that matters...). I think this problem probably needs a bit of a re-think. Some variant on singular LDA, for example, may be a more useful way to think about it. Bill Venables. -----Original Message----- From: r-help-bounces at stat.math.ethz.ch [mailto:r-help-bounces at stat.math.ethz.ch] On Behalf Of Ravi Varadhan Sent: Monday, 2 July 2007 1:29 PM To: 'Patrick Connolly' Cc: r-help at stat.math.ethz.ch; 'Mark Difford' Subject: Re: [R] Question about PCA with prcomp The PCs that are associated with the smaller eigenvalues. ------------------------------------------------------------------------ ---- ------- Ravi Varadhan, Ph.D. Assistant Professor, The Center on Aging and Health Division of Geriatric Medicine and Gerontology Johns Hopkins University Ph: (410) 502-2619 Fax: (410) 614-9625 Email: rvaradhan at jhmi.edu Webpage: http://www.jhsph.edu/agingandhealth/People/Faculty/Varadhan.html ------------------------------------------------------------------------ ---- -------- -----Original Message----- From: Patrick Connolly [mailto:p_connolly at ihug.co.nz] Sent: Monday, July 02, 2007 4:23 PM To: Ravi Varadhan Cc: 'Mark Difford'; r-help at stat.math.ethz.ch Subject: Re: [R] Question about PCA with prcomp On Mon, 02-Jul-2007 at 03:16PM -0400, Ravi Varadhan wrote: |> Mark, |> |> What you are referring to deals with the selection of covariates, |> since PC |> doesn't do dimensionality reduction in the sense of covariate selection. |> But what Mark is asking for is to identify how much each data point |> contributes to individual PCs. I don't think that Mark's query makes much |> sense, unless he meant to ask: which individuals have high/low scores |> on PC1/PC2. Here are some comments that may be tangentially related |> to Mark's |> question: |> |> 1. If one is worried about a few data points contributing heavily to |> the estimation of PCs, then one can use robust PCA, for example, |> using robust covariance matrices. MASS has some tools for this. |> 2. The "biplot" for the first 2 PCs can give some insights 3. PCs, |> especially, the last few PCs, can be used to identify "outliers". What is meant by "last few PCs"? -- ~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~. ___ Patrick Connolly {~._.~} Great minds discuss ideas _( Y )_ Middle minds discuss events (:_~*~_:) Small minds discuss people (_)-(_) ..... Anon ~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.
______________________________________________ R-help at stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. ______________________________________________ R-help at stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
View this message in context: http://www.nabble.com/Question-about-PCA-with-prcomp-tf4012919.html#a11402204 Sent from the R help mailing list archive at Nabble.com.