Skip to content

R vs SPSS output for princomp

6 messages · James Howison, Edgar Acuna, Brian Ripley +1 more

#
Hi,

I am using R to do a principal components analysis for a class
which is generally using SPSS - so some of my question relates to
SPSS output (and this might not be the right place).  I have
scoured the mailing list and the web but can't get a feel for this.
It is annoying because they will be marking to the SPSS output.

Basically I'm getting different values for the component loadings
in SPSS and in R - I suspect that there is some normalization or
scaling going on that I don't understand (and there is plenty I
don't understand).  The scree-plots (and thus eigen values for each
component) and Proportion of Variance figures are identical - but
the factor loadings are an order of magnitude different.  Basically
the SPSS loadings are much higher than those shown by R.

Should the loadings returned by the R princomp function and the
SPSS "Component Matrix" be the same?

And subsidiary question would be:  How does one approximate the
"Kaiser's little jiffy" test for extracting the components (SPSS
by default eliminates those components with eigen values below 1)?
I've been doing this by loadings(DV.prcomped)[,1:x] after inspecting
the scree plot (to set x) - but is there another way?

The full R commands and SPSS syntax follow below along with the
differing output.

Thanks, James
http://freelancepropaganda.com

R analysis
===========
I run:

 > library(mva)
 > DVfmla
~webeval1 + webeval2 + webeval3 + webeval4 + webeval5 + webeval6 +
     webeval7 + webeval8
 > loadings(DV.pca <- princomp(DVfmla, scale=T, cor=T))

Loadings:
          Comp.1 Comp.2 Comp.3 Comp.4 Comp.5 Comp.6 Comp.7 Comp.8
webeval1 -0.357  0.258 -0.202  0.458  0.629 -0.350  0.112 -0.159
webeval2 -0.340  0.510         0.255 -0.305  0.651  0.136 -0.143
webeval3 -0.319  0.316 -0.276 -0.797  0.244        -0.145
webeval4  0.247  0.633  0.681               -0.248
webeval5  0.391  0.150 -0.357 -0.183 -0.158 -0.185  0.584 -0.513
webeval6  0.392  0.252 -0.282  0.140               -0.756 -0.334
webeval7 -0.382  0.128 -0.162        -0.651 -0.596 -0.114  0.121
webeval8  0.377  0.268 -0.428  0.158                0.143  0.746

<snip SS loadings>

 >plot(DV.pca)  # This is exactly the same as the SPSS scree-plot.

SPSS Analysis
=============

FACTOR
   /VARIABLES webeval1 webeval2 webeval3 webeval4
              webeval5 webeval6 webeval7 webeval8
   /MISSING LISTWISE
   /ANALYSIS webeval1 webeval2 webeval3 webeval4
			webeval5 webeval6 webeval7 webeval8
   /PRINT INITIAL EXTRACTION
   /PLOT EIGEN
   /CRITERIA FACTORS(8) ITERATE(25)
   /EXTRACTION PC
   /ROTATION NOROTATE
   /METHOD=CORRELATION .

As mentioned the proportions of varience explained and the scree
plot are identical.  However SPSS produces this "Component Matrix"
which we, in class, have been calling "the loadings":

WEBEVAL1  -0.798  0.253  0.178  0.317 -0.370  0.167 -0.033 -0.037
WEBEVAL2  -0.764  0.487  0.026  0.188  0.186 -0.309 -0.108 -0.043
WEBEVAL3  -0.719  0.309  0.217 -0.564 -0.125 -0.040  0.043  0.052
WEBEVAL4   0.558  0.591 -0.563 -0.063 -0.029  0.131  0.030 -0.019
WEBEVAL5   0.864  0.161  0.313 -0.128  0.075  0.138 -0.221 -0.200
WEBEVAL6   0.876  0.252  0.237  0.100  0.008  0.017 -0.088  0.308
WEBEVAL7  -0.858  0.128  0.133  0.054  0.349  0.308  0.090  0.037
WEBEVAL8   0.847  0.256  0.316  0.111  0.000 -0.087  0.296 -0.094

Can anyone tell me why these are different (It seems likely that
this is a scaling of some kind as the SPSS ones just look to have
been made larger in some way).  Or is it that SPSS is reporting
cumulatively while R is not?

Thanks in advance,
James
#
On Mon, 5 May 2003, James Howison wrote:

            
Only if they are defined the same.  The length of a PCA loading is 
arbitrary.  R's are of length (sum of squares of coefficients) one:
how are SPSS's defined?
eigen values of what exactly?  The component sdev is the aquare roots of
the eigenvalues of the (possibly scaled) covariance matrix: maybe you
intend this only for a correlation matrix?

In R you have the source code, so if you know what you want you can find 
the pieces.
#
Hi,
I compared the R's results with those given by MINITAB and SAS and they
are OK. Your problem is with SPSS that unfortunately I have never used it.

Edgar
On Mon, 5 May 2003, James Howison wrote:

            
#
On Tuesday, May 6, 2003, at 03:00 AM, Prof Brian Ripley wrote:

            
I believe that, based on the "Factor Score Coefficients" section of the 
SPSS algorithm document (am I right in thinking that R's "loadings" are 
also "Factor Score coefficients") this is the calculations that SPSS is 
using?

http://www.spss.com/tech/stat/Algorithms/11.5/factor.pdf

To quote (in psuedo latex):

The matrix of factor ladings based on factor m is:

\lambda_m = \omega_m {\gamma_m}^{\frac{1}{2}}

where

\omega_m = (w_1,w_2,...,w_m)
\gamma_m = diag(abs{y_1},abs{y_2},....,abs{y_m})

For a correlation matrix

y_1 >= y_2 >= y_2 >= ... >= y_m are the eigenvalues and w_i are the 
corresponding eigenvectors of R, where R is the correlation matrix.

(skipping down to the bottom of the document)

the coefficients (loadings) are based on (PC without rotation (my 
example))

W = \lambda_m {\gamma_m}^-1

where
S_m = factor structure matrix and
\lambda_m = S_m for orthogonal rotations

I'm afraid that my mathematical skills are not up to comparing these 
algorithm explained in the SPSS document with the R source code :(  
Hopefully the difference is obvious to somebody here.
Yes I do - I'm using only the correlation matrix.  I understood that it 
was common (following Kaiser's suggestion) to extract only components 
which have eigenvalues above 1 (i.e. explain as much variance as at 
least one of the input variables).  I understand that is considered 
statistically crude but is still common.

I guess I'm expecting an interface for PCA not too dissimilar to that 
of factanal (as it is in other statistical packages).  Perhaps there 
are sounds statisical reasons for not wanting to hide this step from 
the user but perhaps it is interesting to you to know people's 
expectations when using the princomp function.
Apologies that this is a bit beyond me right at the moment.  I do, 
however appreciate your comments and the fact that the source is 
available.

James
Doctoral Student
School of Information Studies
Syracuse University
#
On Tue, 6 May 2003, James Howison wrote:

            
Well, many other packages confuse (hopelessly) PCA and factor analysis,
including SPSS.  They are separate statistical methods with very different
purposes, that for factanal being quite rarely appropriate.  R is not
written to reproduce the mistakes of other packages, but to implement
sound statistical practice.
#
If you want factor analysis, you should use factanal or more generally, 
MLE, true.  Nonetheless, I have use for PCA as a factor extraction method 
in a couple of situations:

1.  To replicate results from that method
2.  When the covariance matrix is non-positive definite

I have written some code to do this.  See:

http://home.earthlink.net/~bmagill/MyMisc.html

Find the function prinfact and associated methods and functions.  This 
would replicate SPSS results of "factor analysis by principal components".

Another better option might be OLS estimation for the second situation.  I 
haven't the ability to implement this myself.  Maybe a future version of R?
At 04:31 PM 5/6/2003 +0100, Prof Brian Ripley wrote: