Dear all: When I have more variables than units, say a 195*10896 matrix which has 10896 variables and 195 samples. prcomp will give only 195 principal components. I checked in the help, but there is no explanation that why this happen. Can we get more than 195 PCs for this case? Thank you very much. Best! Alan Aug-12-2005
PCA problem in R
5 messages · Brian Ripley, R.P.Clement@westminster.ac.uk, Alan Zhao +1 more
On Sat, 13 Aug 2005, Alan Zhao wrote:
When I have more variables than units, say a 195*10896 matrix which has 10896 variables and 195 samples. prcomp will give only 195 principal components. I checked in the help, but there is no explanation that why this happen.
There is not even a definition of a PC in the help. Did you read the references? This is what they are given for!
Can we get more than 195 PCs for this case? Thank you very much.
Check out the theory in the references. You can, but all the remaining ones are constant across samples and not uniquely defined. You are likely to have trouble storing the coefficients (10701x10896 is 800Mb). It would be better to do whatever you intend to do with them without explicitly computing them.
Brian D. Ripley, ripley at stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595
Hi. I have two comments on this. Quoting Prof Brian Ripley <ripley at stats.ox.ac.uk>:
On Sat, 13 Aug 2005, Alan Zhao wrote:
When I have more variables than units, say a 195*10896 matrix which has 10896 variables and 195 samples. prcomp will give only 195 principal components. I checked in the help, but there is no explanation that why this happen.
There is not even a definition of a PC in the help. Did you read the references? This is what they are given for!
I don't know if it's too simple and introductory for the OP, but I quite like Lindsay Smith's intro to PCA. http://www.cs.otago.ac.nz/cosc453/student_tutorials/principal_components.pdf
Can we get more than 195 PCs for this case? Thank you very much.
Check out the theory in the references. You can, but all the remaining ones are constant across samples and not uniquely defined. You are likely to have trouble storing the coefficients (10701x10896 is 800Mb). It would be better to do whatever you intend to do with them without explicitly computing them.
I've been using prcomp on data with 50 samples and 8000 variables. That completes in acceptable time on a very modest (XP2000+/512M/rh9) machine. Though, I note that I only have 1/4 of the samples of the OP. Cheers, Ross-c
R.P.Clement at westminster.ac.uk wrote:
Hi. I have two comments on this. Quoting Prof Brian Ripley <ripley at stats.ox.ac.uk>:
On Sat, 13 Aug 2005, Alan Zhao wrote:
When I have more variables than units, say a 195*10896 matrix which has 10896 variables and 195 samples. prcomp will give only 195 principal components. I checked in the help, but there is no explanation that why this happen.
There is not even a definition of a PC in the help. Did you read the references? This is what they are given for!
I don't know if it's too simple and introductory for the OP, but I quite like Lindsay Smith's intro to PCA. http://www.cs.otago.ac.nz/cosc453/student_tutorials/principal_components.pdf
It is a very good tutorial. Thank you very much for your help. Sincerely, Zheng Zhao Aug-14-2005
To add to Brian Ripley's note: All but possibly the first few (1-3, say) PC's are very likely random numbers. You need to either consult references or get statistical help to understand why. May I also suggest that you add Prof Ripley's book on PATTERN RECOGNITION AND NEURAL NETWORKS to your reading list -- in particular, Ch. 9. -- Bert Gunter Genentech Non-Clinical Statistics South San Francisco, CA "The business of the statistician is to catalyze the scientific learning process." - George E. P. Box
-----Original Message----- From: r-help-bounces at stat.math.ethz.ch [mailto:r-help-bounces at stat.math.ethz.ch] On Behalf Of Prof Brian Ripley Sent: Saturday, August 13, 2005 11:26 PM To: Alan Zhao Cc: r-help at stat.math.ethz.ch Subject: Re: [R] PCA problem in R On Sat, 13 Aug 2005, Alan Zhao wrote:
When I have more variables than units, say a 195*10896
matrix which has
10896 variables and 195 samples. prcomp will give only 195 principal components. I checked in the help, but there is no
explanation that why
this happen.
There is not even a definition of a PC in the help. Did you read the references? This is what they are given for!
Can we get more than 195 PCs for this case? Thank you very much.
Check out the theory in the references. You can, but all the remaining ones are constant across samples and not uniquely defined. You are likely to have trouble storing the coefficients (10701x10896 is 800Mb). It would be better to do whatever you intend to do with them without explicitly computing them. -- Brian D. Ripley, ripley at stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595
______________________________________________ R-help at stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html