PCA problem in R

[SNIP]>>
On Sat, 13 Aug 2005, Alan Zhao wrote:

When I have more variables than units, say a 195*10896 matrix which has
10896 variables and 195 samples. prcomp will give only 195 principal
components. I checked in the help, but there is no explanation that why
this happen.
[SNIP]
Sincerely,
Zheng Zhao
Aug-14-2005
______________________________________________
Just yesterday I subscribed to r-help because I am planning
on learning the basics of R ... today.   :-)
Thus, I am not sure about the history of this question.

The above situation, more variables than samples, 
is commonly encounterd in the climate studies.
Consider annual mean temperatures for 195 years
on a coarse 72 [lat] x 144 [lon]  grid [72*144=10368 
spatial variables]. 

Let  S be the number of grid points and T be the number
of years. I think there is a theorem (?Eckart-Young?) 
which states that the maximum number of unique eigenvalues 
is min(S,T). In your case 195 eigenvalues is correct. 
I speculate that the underlying function transposes the 
input data matrix and computes the the TxT [rather than SxS]
covariance matrix and solves for the eigenvalues/vectors. 
It then uses a linear transformation to get the results
for the original input data matrix.

Computationally, the above is much faster and uses less memory.
You are wrong. No covariance matrix is computed. Please don't "speculate" --
read the Help file which clearly states:

"The calculation is done by a singular value decomposition of the (centered
and possibly scaled) data matrix, not by using eigen on the covariance
matrix. This is generally the preferred method for numerical accuracy. "

-- Bert Gunter
I speculate that the underlying function transposes the 
input data matrix and computes the the TxT [rather than SxS]
covariance matrix and solves for the eigenvalues/vectors. 
It then uses a linear transformation to get the results
for the original input data matrix.

Computationally, the above is much faster and uses less memory.

______________________________________________
R-help at stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! 
http://www.R-project.org/posting-guide.html

[SNIP]>>
On Sat, 13 Aug 2005, Alan Zhao wrote:

When I have more variables than units, say a 195*10896 matrix which has
10896 variables and 195 samples. prcomp will give only 195 principal
components. I checked in the help, but there is no explanation that why
this happen.
[SNIP]

Sincerely,
Zheng Zhao
Aug-14-2005
______________________________________________
Just yesterday I subscribed to r-help because I am planning
on learning the basics of R ... today.   :-)
Thus, I am not sure about the history of this question.
The above situation, more variables than samples,
is commonly encounterd in the climate studies.
Consider annual mean temperatures for 195 years
on a coarse 72 [lat] x 144 [lon]  grid [72*144=10368
spatial variables].
Which are variables and which are samples here?  In standard statistical 
parlance you have 195 variables at 10368 samples. In some fields there are 
the concepts of R-mode and Q-mode PCA, and you seem to be in Q-mode, which 
is why you have a transpose.
Let  S be the number of grid points and T be the number
of years. I think there is a theorem (?Eckart-Young?)
which states that the maximum number of unique eigenvalues
is min(S,T). In your case 195 eigenvalues is correct.
Eigenvalues of what?  Eckart-Young is about the SVD, see e.g.

http://voteview.com/ideal_point_Eckart_Young_Theorem.htm

as Googling easily shows.  (It is used to prove some of the approximation 
properties of PCA, e.g. in

http://www.stats.ox.ac.uk/~ripley/MultAnal_MT2004/PCA.pdf)
I speculate that the underlying function transposes the
input data matrix and computes the the TxT [rather than SxS]
covariance matrix and solves for the eigenvalues/vectors.
It then uses a linear transformation to get the results
for the original input data matrix.

Computationally, the above is much faster and uses less memory.
You speculate incorrectly, even in your Q-mode view of the world.
The real point is that is solves a different problem, which is what my 
answer to the original post was about.
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
It really would be a good idea to do the homework it suggests.
Brian D. Ripley,                  ripley at stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595