Hi, I am using R to do a principal components analysis for a class which is generally using SPSS - so some of my question relates to SPSS output (and this might not be the right place). I have scoured the mailing list and the web but can't get a feel for this. It is annoying because they will be marking to the SPSS output. Basically I'm getting different values for the component loadings in SPSS and in R - I suspect that there is some normalization or scaling going on that I don't understand (and there is plenty I don't understand). The scree-plots (and thus eigen values for each component) and Proportion of Variance figures are identical - but the factor loadings are an order of magnitude different. Basically the SPSS loadings are much higher than those shown by R. Should the loadings returned by the R princomp function and the SPSS "Component Matrix" be the same? And subsidiary question would be: How does one approximate the "Kaiser's little jiffy" test for extracting the components (SPSS by default eliminates those components with eigen values below 1)? I've been doing this by loadings(DV.prcomped)[,1:x] after inspecting the scree plot (to set x) - but is there another way? The full R commands and SPSS syntax follow below along with the differing output. Thanks, James http://freelancepropaganda.com R analysis =========== I run: > library(mva) > DVfmla ~webeval1 + webeval2 + webeval3 + webeval4 + webeval5 + webeval6 + webeval7 + webeval8 > loadings(DV.pca <- princomp(DVfmla, scale=T, cor=T)) Loadings: Comp.1 Comp.2 Comp.3 Comp.4 Comp.5 Comp.6 Comp.7 Comp.8 webeval1 -0.357 0.258 -0.202 0.458 0.629 -0.350 0.112 -0.159 webeval2 -0.340 0.510 0.255 -0.305 0.651 0.136 -0.143 webeval3 -0.319 0.316 -0.276 -0.797 0.244 -0.145 webeval4 0.247 0.633 0.681 -0.248 webeval5 0.391 0.150 -0.357 -0.183 -0.158 -0.185 0.584 -0.513 webeval6 0.392 0.252 -0.282 0.140 -0.756 -0.334 webeval7 -0.382 0.128 -0.162 -0.651 -0.596 -0.114 0.121 webeval8 0.377 0.268 -0.428 0.158 0.143 0.746 <snip SS loadings> >plot(DV.pca) # This is exactly the same as the SPSS scree-plot. SPSS Analysis ============= FACTOR /VARIABLES webeval1 webeval2 webeval3 webeval4 webeval5 webeval6 webeval7 webeval8 /MISSING LISTWISE /ANALYSIS webeval1 webeval2 webeval3 webeval4 webeval5 webeval6 webeval7 webeval8 /PRINT INITIAL EXTRACTION /PLOT EIGEN /CRITERIA FACTORS(8) ITERATE(25) /EXTRACTION PC /ROTATION NOROTATE /METHOD=CORRELATION . As mentioned the proportions of varience explained and the scree plot are identical. However SPSS produces this "Component Matrix" which we, in class, have been calling "the loadings": WEBEVAL1 -0.798 0.253 0.178 0.317 -0.370 0.167 -0.033 -0.037 WEBEVAL2 -0.764 0.487 0.026 0.188 0.186 -0.309 -0.108 -0.043 WEBEVAL3 -0.719 0.309 0.217 -0.564 -0.125 -0.040 0.043 0.052 WEBEVAL4 0.558 0.591 -0.563 -0.063 -0.029 0.131 0.030 -0.019 WEBEVAL5 0.864 0.161 0.313 -0.128 0.075 0.138 -0.221 -0.200 WEBEVAL6 0.876 0.252 0.237 0.100 0.008 0.017 -0.088 0.308 WEBEVAL7 -0.858 0.128 0.133 0.054 0.349 0.308 0.090 0.037 WEBEVAL8 0.847 0.256 0.316 0.111 0.000 -0.087 0.296 -0.094 Can anyone tell me why these are different (It seems likely that this is a scaling of some kind as the SPSS ones just look to have been made larger in some way). Or is it that SPSS is reporting cumulatively while R is not? Thanks in advance, James
R vs SPSS output for princomp
6 messages · James Howison, Edgar Acuna, Brian Ripley +1 more
On Mon, 5 May 2003, James Howison wrote:
I am using R to do a principal components analysis for a class which is generally using SPSS - so some of my question relates to SPSS output (and this might not be the right place). I have scoured the mailing list and the web but can't get a feel for this. It is annoying because they will be marking to the SPSS output. Basically I'm getting different values for the component loadings in SPSS and in R - I suspect that there is some normalization or scaling going on that I don't understand (and there is plenty I don't understand). The scree-plots (and thus eigen values for each component) and Proportion of Variance figures are identical - but the factor loadings are an order of magnitude different. Basically the SPSS loadings are much higher than those shown by R. Should the loadings returned by the R princomp function and the SPSS "Component Matrix" be the same?
Only if they are defined the same. The length of a PCA loading is arbitrary. R's are of length (sum of squares of coefficients) one: how are SPSS's defined?
And subsidiary question would be: How does one approximate the "Kaiser's little jiffy" test for extracting the components (SPSS by default eliminates those components with eigen values below 1)? I've been doing this by loadings(DV.prcomped)[,1:x] after inspecting the scree plot (to set x) - but is there another way?
eigen values of what exactly? The component sdev is the aquare roots of the eigenvalues of the (possibly scaled) covariance matrix: maybe you intend this only for a correlation matrix? In R you have the source code, so if you know what you want you can find the pieces.
Brian D. Ripley, ripley at stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595
Hi, I compared the R's results with those given by MINITAB and SAS and they are OK. Your problem is with SPSS that unfortunately I have never used it. Edgar
On Mon, 5 May 2003, James Howison wrote:
Hi, I am using R to do a principal components analysis for a class which is generally using SPSS - so some of my question relates to SPSS output (and this might not be the right place). I have scoured the mailing list and the web but can't get a feel for this. It is annoying because they will be marking to the SPSS output. Basically I'm getting different values for the component loadings in SPSS and in R - I suspect that there is some normalization or scaling going on that I don't understand (and there is plenty I don't understand). The scree-plots (and thus eigen values for each component) and Proportion of Variance figures are identical - but the factor loadings are an order of magnitude different. Basically the SPSS loadings are much higher than those shown by R. Should the loadings returned by the R princomp function and the SPSS "Component Matrix" be the same? And subsidiary question would be: How does one approximate the "Kaiser's little jiffy" test for extracting the components (SPSS by default eliminates those components with eigen values below 1)? I've been doing this by loadings(DV.prcomped)[,1:x] after inspecting the scree plot (to set x) - but is there another way? The full R commands and SPSS syntax follow below along with the differing output. Thanks, James http://freelancepropaganda.com R analysis =========== I run:
> library(mva) > DVfmla
~webeval1 + webeval2 + webeval3 + webeval4 + webeval5 + webeval6 +
webeval7 + webeval8
> loadings(DV.pca <- princomp(DVfmla, scale=T, cor=T))
Loadings:
Comp.1 Comp.2 Comp.3 Comp.4 Comp.5 Comp.6 Comp.7 Comp.8
webeval1 -0.357 0.258 -0.202 0.458 0.629 -0.350 0.112 -0.159
webeval2 -0.340 0.510 0.255 -0.305 0.651 0.136 -0.143
webeval3 -0.319 0.316 -0.276 -0.797 0.244 -0.145
webeval4 0.247 0.633 0.681 -0.248
webeval5 0.391 0.150 -0.357 -0.183 -0.158 -0.185 0.584 -0.513
webeval6 0.392 0.252 -0.282 0.140 -0.756 -0.334
webeval7 -0.382 0.128 -0.162 -0.651 -0.596 -0.114 0.121
webeval8 0.377 0.268 -0.428 0.158 0.143 0.746
<snip SS loadings>
>plot(DV.pca) # This is exactly the same as the SPSS scree-plot.
SPSS Analysis
=============
FACTOR
/VARIABLES webeval1 webeval2 webeval3 webeval4
webeval5 webeval6 webeval7 webeval8
/MISSING LISTWISE
/ANALYSIS webeval1 webeval2 webeval3 webeval4
webeval5 webeval6 webeval7 webeval8
/PRINT INITIAL EXTRACTION
/PLOT EIGEN
/CRITERIA FACTORS(8) ITERATE(25)
/EXTRACTION PC
/ROTATION NOROTATE
/METHOD=CORRELATION .
As mentioned the proportions of varience explained and the scree
plot are identical. However SPSS produces this "Component Matrix"
which we, in class, have been calling "the loadings":
WEBEVAL1 -0.798 0.253 0.178 0.317 -0.370 0.167 -0.033 -0.037
WEBEVAL2 -0.764 0.487 0.026 0.188 0.186 -0.309 -0.108 -0.043
WEBEVAL3 -0.719 0.309 0.217 -0.564 -0.125 -0.040 0.043 0.052
WEBEVAL4 0.558 0.591 -0.563 -0.063 -0.029 0.131 0.030 -0.019
WEBEVAL5 0.864 0.161 0.313 -0.128 0.075 0.138 -0.221 -0.200
WEBEVAL6 0.876 0.252 0.237 0.100 0.008 0.017 -0.088 0.308
WEBEVAL7 -0.858 0.128 0.133 0.054 0.349 0.308 0.090 0.037
WEBEVAL8 0.847 0.256 0.316 0.111 0.000 -0.087 0.296 -0.094
Can anyone tell me why these are different (It seems likely that
this is a scaling of some kind as the SPSS ones just look to have
been made larger in some way). Or is it that SPSS is reporting
cumulatively while R is not?
Thanks in advance,
James
______________________________________________ R-help at stat.math.ethz.ch mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help
On Tuesday, May 6, 2003, at 03:00 AM, Prof Brian Ripley wrote:
On Mon, 5 May 2003, James Howison wrote:
I am using R to do a principal components analysis for a class which is generally using SPSS - so some of my question relates to SPSS output (and this might not be the right place). I have scoured the mailing list and the web but can't get a feel for this. It is annoying because they will be marking to the SPSS output. Basically I'm getting different values for the component loadings in SPSS and in R - I suspect that there is some normalization or scaling going on that I don't understand (and there is plenty I don't understand). The scree-plots (and thus eigen values for each component) and Proportion of Variance figures are identical - but the factor loadings are an order of magnitude different. Basically the SPSS loadings are much higher than those shown by R. Should the loadings returned by the R princomp function and the SPSS "Component Matrix" be the same?
Only if they are defined the same. The length of a PCA loading is arbitrary. R's are of length (sum of squares of coefficients) one: how are SPSS's defined?
I believe that, based on the "Factor Score Coefficients" section of the SPSS algorithm document (am I right in thinking that R's "loadings" are also "Factor Score coefficients") this is the calculations that SPSS is using? http://www.spss.com/tech/stat/Algorithms/11.5/factor.pdf To quote (in psuedo latex): The matrix of factor ladings based on factor m is: \lambda_m = \omega_m {\gamma_m}^{\frac{1}{2}} where \omega_m = (w_1,w_2,...,w_m) \gamma_m = diag(abs{y_1},abs{y_2},....,abs{y_m}) For a correlation matrix y_1 >= y_2 >= y_2 >= ... >= y_m are the eigenvalues and w_i are the corresponding eigenvectors of R, where R is the correlation matrix. (skipping down to the bottom of the document) the coefficients (loadings) are based on (PC without rotation (my example)) W = \lambda_m {\gamma_m}^-1 where S_m = factor structure matrix and \lambda_m = S_m for orthogonal rotations I'm afraid that my mathematical skills are not up to comparing these algorithm explained in the SPSS document with the R source code :( Hopefully the difference is obvious to somebody here.
And subsidiary question would be: How does one approximate the "Kaiser's little jiffy" test for extracting the components (SPSS by default eliminates those components with eigen values below 1)? I've been doing this by loadings(DV.prcomped)[,1:x] after inspecting the scree plot (to set x) - but is there another way?
eigen values of what exactly? The component sdev is the aquare roots of the eigenvalues of the (possibly scaled) covariance matrix: maybe you intend this only for a correlation matrix?
Yes I do - I'm using only the correlation matrix. I understood that it was common (following Kaiser's suggestion) to extract only components which have eigenvalues above 1 (i.e. explain as much variance as at least one of the input variables). I understand that is considered statistically crude but is still common. I guess I'm expecting an interface for PCA not too dissimilar to that of factanal (as it is in other statistical packages). Perhaps there are sounds statisical reasons for not wanting to hide this step from the user but perhaps it is interesting to you to know people's expectations when using the princomp function.
In R you have the source code, so if you know what you want you can find the pieces.
Apologies that this is a bit beyond me right at the moment. I do, however appreciate your comments and the fact that the source is available. James Doctoral Student School of Information Studies Syracuse University
-- Brian D. Ripley, ripley at stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595
On Tue, 6 May 2003, James Howison wrote:
I guess I'm expecting an interface for PCA not too dissimilar to that of factanal (as it is in other statistical packages). Perhaps there are sounds statisical reasons for not wanting to hide this step from the user but perhaps it is interesting to you to know people's expectations when using the princomp function.
Well, many other packages confuse (hopelessly) PCA and factor analysis, including SPSS. They are separate statistical methods with very different purposes, that for factanal being quite rarely appropriate. R is not written to reproduce the mistakes of other packages, but to implement sound statistical practice.
Brian D. Ripley, ripley at stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595
If you want factor analysis, you should use factanal or more generally, MLE, true. Nonetheless, I have use for PCA as a factor extraction method in a couple of situations: 1. To replicate results from that method 2. When the covariance matrix is non-positive definite I have written some code to do this. See: http://home.earthlink.net/~bmagill/MyMisc.html Find the function prinfact and associated methods and functions. This would replicate SPSS results of "factor analysis by principal components". Another better option might be OLS estimation for the second situation. I haven't the ability to implement this myself. Maybe a future version of R?
At 04:31 PM 5/6/2003 +0100, Prof Brian Ripley wrote:
Well, many other packages confuse (hopelessly) PCA and factor analysis, including SPSS. They are separate statistical methods with very different purposes, that for factanal being quite rarely appropriate. R is not written to reproduce the mistakes of other packages, but to implement sound statistical practice.