Princomp(), prcomp() and loadings() - R-help

F.Tusell

Wed, Nov 3, 2004 8:06 AM #

In comparing the results of princomp and prcomp I find:

   1. The reported standard deviations are similar but about 1% from
      each other, which seems well above round-off error.
   2. princomp returns what I understand are variances and cumulative
      variances accounted for by each principal component which are
      all equal. "SS loadings" is always 1. 
   3. Same happens after the loadings are varimax-rotated, which in 
      general should alter the proportions of variance accounted by 
      each component.

It looks as if the loadings() function were expecting the eigenvectors
to be normalized to the corresponding eigenvalue.

Transcript and version information follow signature. Thank you for any
clues.

ft.

Fernando TUSELL                                e-mail:
Departamento de Econometr??a y Estad??stica           etptupaf at bs.ehu.es 
Facultad de CC.EE. y Empresariales             Tel:   (+34)94.601.3733
Avenida Lendakari Aguirre, 83                  Fax:   (+34)94.601.3754
E-48015 BILBAO  (Spain)                        Secr:  (+34)94.601.3740
----------------------------------------------------------------------




> pca.1 <- prcomp(USArrests)
> pca.1
Standard deviations:
[1] 83.732400 14.212402  6.489426  2.482790

Rotation:
                PC1         PC2         PC3         PC4
Murder   0.04170432 -0.04482166  0.07989066 -0.99492173
Assault  0.99522128 -0.05876003 -0.06756974  0.03893830
UrbanPop 0.04633575  0.97685748 -0.20054629 -0.05816914
Rape     0.07515550  0.20071807  0.97408059  0.07232502
> pca.2 <- princomp(USArrests)
> pca.2
Call:
princomp(x = USArrests)

Standard deviations:
   Comp.1    Comp.2    Comp.3    Comp.4 
82.890847 14.069560  6.424204  2.457837 

 4  variables and  50 observations.
> summary(pca.2)
Importance of components:
                           Comp.1      Comp.2      Comp.3       Comp.4
Standard deviation     82.8908472 14.06956001 6.424204055 2.4578367034
Proportion of Variance  0.9655342  0.02781734 0.005799535 0.0008489079
Cumulative Proportion   0.9655342  0.99335156 0.999151092 1.0000000000
> loadings(pca.2)

Loadings:
         Comp.1 Comp.2 Comp.3 Comp.4
Murder                         0.995
Assault  -0.995                     
UrbanPop        -0.977 -0.201       
Rape            -0.201  0.974       

               Comp.1 Comp.2 Comp.3 Comp.4
SS loadings      1.00   1.00   1.00   1.00
Proportion Var   0.25   0.25   0.25   0.25
Cumulative Var   0.25   0.50   0.75   1.00

> varimax(pca.2$loadings[,1:3])
$loadings

Loadings:
         Comp.1 Comp.2 Comp.3
Murder                       
Assault  -0.998              
UrbanPop        -0.997       
Rape                    0.995

               Comp.1 Comp.2 Comp.3
SS loadings      1.00   1.00   1.00
Proportion Var   0.25   0.25   0.25
Cumulative Var   0.25   0.50   0.75

$rotmat
            [,1]       [,2]       [,3]
[1,]  0.99211386 0.03604908 -0.1200439
[2,] -0.05442524 0.98664663 -0.1535132
[3,]  0.11290692 0.15883603  0.9808278

> R.Version()
$platform
[1] "i386-pc-linux-gnu"

$arch
[1] "i386"

$os
[1] "linux-gnu"

$system
[1] "i386, linux-gnu"

$status
[1] ""

$major
[1] "2"

$minor
[1] "0.0"

$year
[1] "2004"

$month
[1] "10"

$day
[1] "04"

$language
[1] "R"

Sundar Dorai-Raj

Wed, Nov 3, 2004 8:24 AM #

F.Tusell wrote:

Did you read the corresponding help files?

from ?prcomp:

<quote>
Details:

      The calculation is done by a singular value decomposition of the
      (centered and possibly scaled) data matrix, not by using 'eigen'
      on the covariance matrix.  This is generally the preferred method
      for numerical accuracy.
</quote>

from ?princomp:

<quote>
Details:

      The calculation is done using 'eigen' on the correlation or
      covariance matrix, as determined by 'cor'.  This is done for
      compatibility with the S-PLUS result.  A preferred method of
      calculation is to use 'svd' on 'x', as is done in 'prcomp'.
</quote>

HTH,

--sundar

Brian Ripley

Wed, Nov 3, 2004 8:34 AM #

On Wed, 3 Nov 2004, F.Tusell wrote:

That is explained on the help page!  E.g.

     Note that the default calculation uses divisor 'N' for the
     covariance matrix.

and there is even an example:

     princomp(USArrests, cor = TRUE) # =^= prcomp(USArrests, scale=TRUE)
     ## Similar, but different:
     ## The standard deviations differ by a factor of sqrt(49/50)

Hmmm.  Varimax rotation of PCA (not factor analysis) is not supported in
base R, so this is not surprising.  Please do as the posting guide asks,
and read the help page (even its title!) before posting.

The best clue is that the help pages are a very useful resource, but need 
to be read as carefully as they were written.

Brian D. Ripley,                  ripley at stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595

Bert Gunter

Wed, Nov 3, 2004 11:21 AM #

My apologies: This is an R-help kvetch only. 

<kvetch>

I would like to forcefully highlight Brian Ripley's remark:

This is truly the case (at least for the standard R distribution packages)!!
The help pages are remarkably well written and more often than not include
very informative examples (e.g., plotmath()). Both they and the FAQ's are
valuable resources for "newbies' and longtime R users with porous brains
like me. Hence, I continue to be amazed and disappointed by the number of
questions this list receives that could have been answered by carefully
reading these resources.

So may I make a suggestion: The standard script added to all postings is:
<quote>
PLEASE do read the posting guide!
http://www.R-project.org/posting-guide.html
</quote>

However, this requires the user to take the extra time and effort to
actually peruse the guide, where the admonition to read Help Pages is
contained. Apparently, many fail to do this. So my suggestion is: modify the
above script to:

<quote>
The Help pages provide full,carefully written documentation and numerous
illustrative examples. The FAQ's contain further useful information. PLEASE
consult these resources before posting questions and PLEASE do read the
posting guide, http://www.R-project.org/posting-guide.html, so that the
questions you post can be better answered.
</quote>

I know that this is rather prolix (perhaps others can do better at it), but
maybe it will more directly engage prospective posters and thereby avoid the
need for Brian's and many others frequent reminders. Or do I delude myself
-- again?!

</kvetch>

Cheers,
Bert Gunter

F.Tusell

Thu, Nov 4, 2004 12:52 AM #

El mi??, 03-11-2004 a las 17:34, Prof Brian Ripley escribi??:

Mea culpa, even without resort to the example in the help page I 
       should have thought that differences in the divisor for the 
       covariance matrix (or using directly the data matrix, as Sundar 
       Dorai-Raj explained) might account for the small discrepancies.
       My apologies.

Well, the title says: "Rotation Methods for Factor Analysis" and
       this was not enough to deter me from using it on PCA generated 
       loadings. "Factor Analysis" is sometimes meant to include PCA,
       even if they are different beasts. Looking at the help page for
       "loadings" it also says: "Print Loadings in Factor Analysis", 
       yet it is meant to be used with PCA loadings as well, as the
       later the Description section goes on to saying.

       But this is a side question. What I asked in point 2) is why
       loadings(princomp(USArrests)) reports as the last four lines

                         Comp.1 Comp.2 Comp.3 Comp.4
          SS loadings      1.00   1.00   1.00   1.00
          Proportion Var   0.25   0.25   0.25   0.25
          Cumulative Var   0.25   0.50   0.75   1.00

       when the eigenvalues are different and so are the amounts of 
       variance explained by each component. I used "varimax" merely 
       to produce a different set of loadings and check that the same
       behaviour recurred.

The first line reported by loadings, "SS loadings", is right. 
       But the loadings matrix returned by princomp has its columns 
       normalized to 1, while loadings returned by factanal are 
       not. Hence, with the later, the last two lines are what I 
       expected, while with the former they are not.

       Perhaps the loadings matrices returned by princomp and factanal
       should be made of a different class, so loadings (or 
       print.loadings) treats them differently?

       ft.

Fernando TUSELL                                e-mail:
Departamento de Econometr??a y Estad??stica           etptupaf at bs.ehu.es 
Facultad de CC.EE. y Empresariales             Tel:   (+34)94.601.3733
Avenida Lendakari Aguirre, 83                  Fax:   (+34)94.601.3754
E-48015 BILBAO  (Spain)                        Secr:  (+34)94.601.3740