Skip to content

PCA problem in R

3 messages · Dennis Shea, Bert Gunter, Brian Ripley

#
[SNIP]>>
[SNIP]
Just yesterday I subscribed to r-help because I am planning
on learning the basics of R ... today.   :-)
Thus, I am not sure about the history of this question.

The above situation, more variables than samples, 
is commonly encounterd in the climate studies.
Consider annual mean temperatures for 195 years
on a coarse 72 [lat] x 144 [lon]  grid [72*144=10368 
spatial variables]. 

Let  S be the number of grid points and T be the number
of years. I think there is a theorem (?Eckart-Young?) 
which states that the maximum number of unique eigenvalues 
is min(S,T). In your case 195 eigenvalues is correct. 
I speculate that the underlying function transposes the 
input data matrix and computes the the TxT [rather than SxS]
covariance matrix and solves for the eigenvalues/vectors. 
It then uses a linear transformation to get the results
for the original input data matrix.

Computationally, the above is much faster and uses less memory.
#
You are wrong. No covariance matrix is computed. Please don't "speculate" --
read the Help file which clearly states:

"The calculation is done by a singular value decomposition of the (centered
and possibly scaled) data matrix, not by using eigen on the covariance
matrix. This is generally the preferred method for numerical accuracy. "

-- Bert Gunter
#
On Mon, 15 Aug 2005, Dennis Shea wrote:

            
Which are variables and which are samples here?  In standard statistical 
parlance you have 195 variables at 10368 samples. In some fields there are 
the concepts of R-mode and Q-mode PCA, and you seem to be in Q-mode, which 
is why you have a transpose.
Eigenvalues of what?  Eckart-Young is about the SVD, see e.g.

http://voteview.com/ideal_point_Eckart_Young_Theorem.htm

as Googling easily shows.  (It is used to prove some of the approximation 
properties of PCA, e.g. in

http://www.stats.ox.ac.uk/~ripley/MultAnal_MT2004/PCA.pdf)
You speculate incorrectly, even in your Q-mode view of the world.
The real point is that is solves a different problem, which is what my 
answer to the original post was about.
It really would be a good idea to do the homework it suggests.