R vs SPSS output for princomp
On Tuesday, May 6, 2003, at 03:00 AM, Prof Brian Ripley wrote:
On Mon, 5 May 2003, James Howison wrote:
I am using R to do a principal components analysis for a class which is generally using SPSS - so some of my question relates to SPSS output (and this might not be the right place). I have scoured the mailing list and the web but can't get a feel for this. It is annoying because they will be marking to the SPSS output. Basically I'm getting different values for the component loadings in SPSS and in R - I suspect that there is some normalization or scaling going on that I don't understand (and there is plenty I don't understand). The scree-plots (and thus eigen values for each component) and Proportion of Variance figures are identical - but the factor loadings are an order of magnitude different. Basically the SPSS loadings are much higher than those shown by R. Should the loadings returned by the R princomp function and the SPSS "Component Matrix" be the same?
Only if they are defined the same. The length of a PCA loading is arbitrary. R's are of length (sum of squares of coefficients) one: how are SPSS's defined?
I believe that, based on the "Factor Score Coefficients" section of the SPSS algorithm document (am I right in thinking that R's "loadings" are also "Factor Score coefficients") this is the calculations that SPSS is using? http://www.spss.com/tech/stat/Algorithms/11.5/factor.pdf To quote (in psuedo latex): The matrix of factor ladings based on factor m is: \lambda_m = \omega_m {\gamma_m}^{\frac{1}{2}} where \omega_m = (w_1,w_2,...,w_m) \gamma_m = diag(abs{y_1},abs{y_2},....,abs{y_m}) For a correlation matrix y_1 >= y_2 >= y_2 >= ... >= y_m are the eigenvalues and w_i are the corresponding eigenvectors of R, where R is the correlation matrix. (skipping down to the bottom of the document) the coefficients (loadings) are based on (PC without rotation (my example)) W = \lambda_m {\gamma_m}^-1 where S_m = factor structure matrix and \lambda_m = S_m for orthogonal rotations I'm afraid that my mathematical skills are not up to comparing these algorithm explained in the SPSS document with the R source code :( Hopefully the difference is obvious to somebody here.
And subsidiary question would be: How does one approximate the "Kaiser's little jiffy" test for extracting the components (SPSS by default eliminates those components with eigen values below 1)? I've been doing this by loadings(DV.prcomped)[,1:x] after inspecting the scree plot (to set x) - but is there another way?
eigen values of what exactly? The component sdev is the aquare roots of the eigenvalues of the (possibly scaled) covariance matrix: maybe you intend this only for a correlation matrix?
Yes I do - I'm using only the correlation matrix. I understood that it was common (following Kaiser's suggestion) to extract only components which have eigenvalues above 1 (i.e. explain as much variance as at least one of the input variables). I understand that is considered statistically crude but is still common. I guess I'm expecting an interface for PCA not too dissimilar to that of factanal (as it is in other statistical packages). Perhaps there are sounds statisical reasons for not wanting to hide this step from the user but perhaps it is interesting to you to know people's expectations when using the princomp function.
In R you have the source code, so if you know what you want you can find the pieces.
Apologies that this is a bit beyond me right at the moment. I do, however appreciate your comments and the fact that the source is available. James Doctoral Student School of Information Studies Syracuse University
-- Brian D. Ripley, ripley at stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595