Skip to content

PCA problem in R

5 messages · Brian Ripley, R.P.Clement@westminster.ac.uk, Alan Zhao +1 more

#
Dear all:

When I have more variables than units, say a 195*10896 matrix which has 
10896 variables and 195 samples. prcomp will give only 195 principal 
components. I checked in the help, but there is no explanation that why 
this happen. Can we get more than 195 PCs for this case? Thank you very 
much.

Best!
Alan
Aug-12-2005
#
On Sat, 13 Aug 2005, Alan Zhao wrote:

            
There is not even a definition of a PC in the help. Did you read the 
references?  This is what they are given for!
Check out the theory in the references.  You can, but all the remaining 
ones are constant across samples and not uniquely defined.  You are likely 
to have trouble storing the coefficients (10701x10896 is 800Mb).
It would be better to do whatever you intend to do with them without 
explicitly computing them.
#
Hi. I have two comments on this.

Quoting Prof Brian Ripley <ripley at stats.ox.ac.uk>:
I don't know if it's too simple and introductory for the OP, but I quite like
Lindsay Smith's intro to PCA.

http://www.cs.otago.ac.nz/cosc453/student_tutorials/principal_components.pdf
I've been using prcomp on data with 50 samples and 8000 variables. That
completes in acceptable time on a very modest (XP2000+/512M/rh9) machine.
Though, I note that I only have 1/4 of the samples of the OP.

Cheers,

Ross-c
#
R.P.Clement at westminster.ac.uk wrote:
It is a very good tutorial. Thank you very much for your help.

Sincerely,
Zheng Zhao
Aug-14-2005
#
To add to Brian Ripley's note:

All but possibly the first few (1-3, say) PC's are very likely  random
numbers. You need to either consult references or get statistical help to
understand why. May I also suggest that you add Prof Ripley's book on
PATTERN RECOGNITION AND NEURAL NETWORKS to your reading list -- in
particular, Ch. 9.

-- Bert Gunter
Genentech Non-Clinical Statistics
South San Francisco, CA
 
"The business of the statistician is to catalyze the scientific learning
process."  - George E. P. Box