Skip to content

Different PCA results under Windows and Linux

6 messages · Steven McKinney, Peter Dalgaard, jathine +1 more

#
I ran the following R script under both Linux and Windows, and got 2
different results.
Linux R version 2.7.1 and Windows R version 2.7.2.
freqtest.txt file lines of text :
M1 M2 M3 M4 M5 M6 M7 M8
-1 -1 -1 -1 -1 -1 -1 -1
0 0 0 0 -1 -1 1 1
-1 -1 -1 -1 -1 -1 -1 -1
0 0 0 0 -1 -1 1 1 

I also tried mean(xrcc2) and sd(xrcc2) on both machines, the results are the
same. 
Please explain.
#
Not likely that anyone can explain, as
there is not enough information in your
email.

Including the contents of the freqtest.txt file
was a good idea, as the posting guide suggests
(the posting guide is that clearly labeled bit
at the bottom that looks like this:
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
Check it out! It is cool.)

Additionally, include the command 
  sessionInfo() 
and its output from all machines you refer to
so maintainers know which versions of software
you are running.  Also, include the output you obtained
from your code (with your code being a self-contained 
and reproducible set of  R commands).

Finally, describe what the difference is and why
the difference is problematic (i.e. don't report
machine precision differences, or sign differences
for PCA results - PCA vector directions are arbitrary
modulo 180 degrees).
The R maintainers do an amazing job of creating
numerically stable platform-independent software,
so you get the same results almost everywhere.
(Thank you R core!)


HTH

Steve McKinney

-----Original Message-----
From: r-help-bounces at r-project.org on behalf of jathine
Sent: Tue 9/16/2008 2:19 PM
To: r-help at r-project.org
Subject: [R]  Different PCA results under Windows and Linux
 

I ran the following R script under both Linux and Windows, and got 2
different results.
Linux R version 2.7.1 and Windows R version 2.7.2.
freqtest.txt file lines of text :
M1 M2 M3 M4 M5 M6 M7 M8
-1 -1 -1 -1 -1 -1 -1 -1
0 0 0 0 -1 -1 1 1
-1 -1 -1 -1 -1 -1 -1 -1
0 0 0 0 -1 -1 1 1 

I also tried mean(xrcc2) and sd(xrcc2) on both machines, the results are the
same. 
Please explain.
#
Steven McKinney wrote:
And that is provided that the eigenvalues are distinct! Other rotations
are possible if they are not.

If the data given are the whole data matrix, then it has rank 2. The
rest can be rotated arbitrarily in 6-dimensional space.

  
    
#
Thank you for your reply.
Here are some more info, I hope this can explain the problem a bit more
clearly. 
Why PCA gives different results on the two different platforms?

freqtest.txt file line text : 
M1 M2 M3 M4 M5 M6 M7 M8
-1 -1 -1 -1 -1 -1 -1 -1
0 0 0 0 -1 -1 1 1
-1 -1 -1 -1 -1 -1 -1 -1
0 0 0 0 -1 -1 1 1

******Linux R script result and sessionInfo()
$coord
   Dim.1         Dim.2         Dim.3
M1     1 -3.925231e-16 -2.287663e-48
M2     1  7.850462e-17 -3.600641e-32
M3     1  7.850462e-17  9.001602e-33
M4     1  7.850462e-17  9.001602e-33
M5     0  0.000000e+00  0.000000e+00
M6     0  0.000000e+00  0.000000e+00
M7     1  7.850462e-17  9.001602e-33
M8     1  7.850462e-17  9.001602e-33

$cor
   Dim.1         Dim.2         Dim.3
M1     1 -3.925231e-16 -2.287663e-48
M2     1  7.850462e-17 -3.600641e-32
M3     1  7.850462e-17  9.001602e-33
M4     1  7.850462e-17  9.001602e-33
M5   NaN           NaN           NaN
M6   NaN           NaN           NaN
M7     1  7.850462e-17  9.001602e-33
M8     1  7.850462e-17  9.001602e-33

$cos2
   Dim.1        Dim.2        Dim.3
M1     1 1.540744e-31 5.233404e-96
M2     1 6.162976e-33 1.296462e-63
M3     1 6.162976e-33 8.102884e-65
M4     1 6.162976e-33 8.102884e-65
M5   NaN          NaN          NaN
M6   NaN          NaN          NaN
M7     1 6.162976e-33 8.102884e-65
M8     1 6.162976e-33 8.102884e-65

$contrib
      Dim.1     Dim.2        Dim.3
M1 16.66667 83.333333 3.229346e-31
M2 16.66667  3.333333 8.000000e+01
M3 16.66667  3.333333 5.000000e+00
M4 16.66667  3.333333 5.000000e+00
M5  0.00000  0.000000 0.000000e+00
M6  0.00000  0.000000 0.000000e+00
M7 16.66667  3.333333 5.000000e+00
M8 16.66667  3.333333 5.000000e+00
R version 2.7.1 (2008-06-23)
x86_64-redhat-linux-gnu

locale:
LC_CTYPE=en_US.UTF-8;LC_NUMERIC=C;LC_TIME=en_US.UTF-8;LC_COLLATE=en_US.UTF-8;LC_MONETARY=C;LC_MESSAGES=en_US.UTF-8;LC_PAPER=en_US.UTF-8;LC_NAME=C;LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=en_US.UTF-8;LC_IDENTIFICATION=C

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base

other attached packages:
[1] FactoMineR_1.09
******Windows R script result and sessionInfo()
$coord
   Dim.1         Dim.2         Dim.3
M1     1  2.458061e-16 -4.590163e-49
M2     1 -4.916122e-17 -4.750455e-32
M3     1 -4.916122e-17  1.187614e-32
M4     1 -4.916122e-17  1.187614e-32
M5     0  0.000000e+00  0.000000e+00
M6     0  0.000000e+00  0.000000e+00
M7     1 -4.916122e-17  1.187614e-32
M8     1 -4.916122e-17  1.187614e-32

$cor
   Dim.1         Dim.2         Dim.3
M1     1  2.458061e-16 -4.590163e-49
M2     1 -4.916122e-17 -4.750455e-32
M3     1 -4.916122e-17  1.187614e-32
M4     1 -4.916122e-17  1.187614e-32
M5   NaN           NaN           NaN
M6   NaN           NaN           NaN
M7     1 -4.916122e-17  1.187614e-32
M8     1 -4.916122e-17  1.187614e-32

$cos2
   Dim.1        Dim.2        Dim.3
M1     1 6.042064e-32 2.106959e-97
M2     1 2.416826e-33 2.256682e-63
M3     1 2.416826e-33 1.410426e-64
M4     1 2.416826e-33 1.410426e-64
M5   NaN          NaN          NaN
M6   NaN          NaN          NaN
M7     1 2.416826e-33 1.410426e-64
M8     1 2.416826e-33 1.410426e-64
$contrib
      Dim.1     Dim.2        Dim.3
M1 16.66667 83.333333 7.469228e-33
M2 16.66667  3.333333 8.000000e+01
M3 16.66667  3.333333 5.000000e+00
M4 16.66667  3.333333 5.000000e+00
M5  0.00000  0.000000 0.000000e+00
M6  0.00000  0.000000 0.000000e+00
M7 16.66667  3.333333 5.000000e+00
M8 16.66667  3.333333 5.000000e+00
R version 2.7.2 (2008-08-25)
i386-pc-mingw32

locale:
LC_COLLATE=English_United States.1252;LC_CTYPE=English_United
States.1252;LC_MONETARY=English_United
States.1252;LC_NUMERIC=C;LC_TIME=English_United States.1252

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base

other attached packages:
[1] FactoMineR_1.09

        
Steven McKinney wrote:

  
    
#
Hi Jathine,
What is amazing, Jathine, is how nearly exactly identical the two sets of
results are, not that they begin to differ at the 16th decimal place. To
assuage your concerns, do the following on the results from your two trials:

round(p1$var$coord, 15)
?round

## And read the famous FAQ on floating point arithmetic

It also isn't a very good idea to be doing PCAs on 0s and 1s

Regards, Mark.
jathine wrote:

  
    
#
Hi Jathine,

And then to see things more clearly still, you can do something like this on
your test results:

format(formatC(p1$var$coord, digits=15, format="f"), justify="right")

and

format(formatC(p1$var$coord, digits=16, format="f"), justify="right")

Though I do hope that the second command doesn't begin to concern you even
more.

Regards, Mark.
Mark Difford wrote: