Prev 2725 / 12125 Next

[R-pkg-devel] mvrnorm, eigen, tests, and R CMD check

Thu, May 17, 2018 2:28 PM

But why would you want to tie your tests to specific platforms, if
mathematically all those results are equivalent?  You could compare the
orthogonal complements from a full rank matrix (say the identity) to each
expected eigenspace.  E.g., for the example I gave above, where e2 and e1
gave different basis for 2 eigenspaces:

+   residuals(lm.fit(y=diag(5), e1$vectors[,1,drop=FALSE])),
+   residuals(lm.fit(y=diag(5), e2$vectors[,1,drop=FALSE])))
[1] "Mean relative difference: 4.791489e-16"

+   residuals(lm.fit(y=diag(5), e1$vectors[,2:3,drop=FALSE])),
+   residuals(lm.fit(y=diag(5), e2$vectors[,2:3,drop=FALSE])))
[1] "Mean relative difference: 1.469377e-15"

+   residuals(lm.fit(y=diag(5), e1$vectors[,4:5,drop=FALSE])),
+   residuals(lm.fit(y=diag(5), e2$vectors[,4:5,drop=FALSE])))
[1] TRUE


Bill Dunlap
TIBCO Software
wdunlap tibco.com

On Thu, May 17, 2018 at 12:55 PM, Ben Bolker <bbolker at gmail.com> wrote:

There have been various comments in this thread (by me, and I think
Duncan Murdoch) about how you can identify the platform you're running
on (some combination of .Platform and/or R.Version()) and use it to
write conditional statements so that your tests will only be compared
with reference values that were generated on the same platform ... did
those get through?  Did they make sense?

On Thu, May 17, 2018 at 3:30 PM, Kevin Coombes
<kevin.r.coombes at gmail.com> wrote:

Yes; I'm pretty sure that it is exactly the repeated eigenvalues that are
the issue. The matrices I am using are all nonsingular, and the various
algorithms have no problem computing the eigenvalues correctly (up to
numerical errors that I can bound and thus account for on tests by

rounding

appropriately). But an eigenvalue of multiplicity M has an M-dimensional
eigenspace with no preferred basis. So, any M-dimensional  (unitary)

change

of basis is permitted. That's what give rise to the lack of

reproducibility

across architectures. The choice of basis appears to use different
heuristics on 32-bit windows than on 64-bit Windows or Linux machines.

As a

result, I can't include the tests I'd like as part of a CRAN submission.

On Thu, May 17, 2018, 2:29 PM William Dunlap <wdunlap at tibco.com> wrote:

Your explanation needs to be a bit more general in the case of identical
eigenvalues - each distinct eigenvalue has an associated subspace, whose
dimension is the number repeats of that eigenvalue and the eigenvectors

for

that eigenvalue are an orthonormal basis for that subspace.  (With no
repeated eigenvalues this gives your 'unique up to sign'.)

E.g., for the following 5x5 matrix with two eigenvalues of 1 and two of

  > x <- tcrossprod( cbind(c(1,0,0,0,1),c(0,1,0,0,1),c(0,0,1,0,1)) )
  > x

       [,1] [,2] [,3] [,4] [,5]
  [1,]    1    0    0    0    1
  [2,]    0    1    0    0    1
  [3,]    0    0    1    0    1
  [4,]    0    0    0    0    0
  [5,]    1    1    1    0    3
the following give valid but different (by more than sign) eigen vectors

e1 <- structure(list(values = c(4, 1, 0.999999999999999, 0,
-2.22044607159862e-16
), vectors = structure(c(-0.288675134594813, -0.288675134594813,
-0.288675134594813, 0, -0.866025403784439, 0, 0.707106781186547,
-0.707106781186547, 0, 0, 0.816496580927726, -0.408248290463863,
-0.408248290463863, 0, -6.10622663543836e-16, 0, 0, 0, -1, 0,
-0.5, -0.5, -0.5, 0, 0.5), .Dim = c(5L, 5L))), .Names = c("values",
"vectors"), class = "eigen")
e2 <- structure(list(values = c(4, 1, 1, 0, -2.29037708937563e-16),
    vectors = structure(c(0.288675134594813, 0.288675134594813,
    0.288675134594813, 0, 0.866025403784438, -0.784437556312061,
    0.588415847923579, 0.196021708388481, 0, 4.46410900710223e-17,
    0.22654886208902, 0.566068420404321, -0.79261728249334, 0,
    -1.11244069540181e-16, 0, 0, 0, -1, 0, -0.5, -0.5, -0.5,
    0, 0.5), .Dim = c(5L, 5L))), .Names = c("values", "vectors"
), class = "eigen")

I.e.,

all.equal(crossprod(e1$vectors), diag(5), tol=0)

[1] "Mean relative difference: 1.407255e-15"

all.equal(crossprod(e2$vectors), diag(5), tol=0)

[1] "Mean relative difference: 3.856478e-15"

all.equal(e1$vectors %*% diag(e1$values) %*% t(e1$vectors), x, tol=0)

[1] "Mean relative difference: 1.110223e-15"

all.equal(e2$vectors %*% diag(e2$values) %*% t(e2$vectors), x, tol=0)

[1] "Mean relative difference: 9.069735e-16"

e1$vectors

           [,1]       [,2]          [,3] [,4] [,5]
[1,] -0.2886751  0.0000000  8.164966e-01    0 -0.5
[2,] -0.2886751  0.7071068 -4.082483e-01    0 -0.5
[3,] -0.2886751 -0.7071068 -4.082483e-01    0 -0.5
[4,]  0.0000000  0.0000000  0.000000e+00   -1  0.0
[5,] -0.8660254  0.0000000 -6.106227e-16    0  0.5

e2$vectors

          [,1]          [,2]          [,3] [,4] [,5]
[1,] 0.2886751 -7.844376e-01  2.265489e-01    0 -0.5
[2,] 0.2886751  5.884158e-01  5.660684e-01    0 -0.5
[3,] 0.2886751  1.960217e-01 -7.926173e-01    0 -0.5
[4,] 0.0000000  0.000000e+00  0.000000e+00   -1  0.0
[5,] 0.8660254  4.464109e-17 -1.112441e-16    0  0.5





Bill Dunlap
TIBCO Software
wdunlap tibco.com

On Thu, May 17, 2018 at 10:14 AM, Martin Maechler <
maechler at stat.math.ethz.ch> wrote:

Duncan Murdoch ....
    on Thu, 17 May 2018 12:13:01 -0400 writes:

    > On 17/05/2018 11:53 AM, Martin Maechler wrote:

    >>>>>>> Kevin Coombes ... on Thu, 17
    >>>>>>> May 2018 11:21:23 -0400 writes:

    >>    [..................]

    >> > [3] Should the documentation (man page) for "eigen" or
    >> > "mvrnorm" include a warning that the results can change
    >> > from machine to machine (or between things like 32-bit and
    >> > 64-bit R on the same machine) because of difference in
    >> > linear algebra modules? (Possibly including the statement
    >> > that "set.seed" won't save you.)

    >> The problem is that most (young?) people do not read help
    >> pages anymore.
    >>
    >> help(eigen) has contained the following text for years,
    >> and in spite of your good analysis of the problem you
    >> seem to not have noticed the last semi-paragraph:
    >>

    >>> Value:
    >>>
    >>> The spectral decomposition of ?x? is returned as a list
    >>> with components
    >>>
    >>> values: a vector containing the p eigenvalues of ?x?,
    >>> sorted in _decreasing_ order, according to ?Mod(values)?
    >>> in the asymmetric case when they might be complex (even
    >>> for real matrices).  For real asymmetric matrices the
    >>> vector will be complex only if complex conjugate pairs
    >>> of eigenvalues are detected.
    >>>
    >>> vectors: either a p * p matrix whose columns contain the
    >>> eigenvectors of ?x?, or ?NULL? if ?only.values? is
    >>> ?TRUE?.  The vectors are normalized to unit length.
    >>>
    >>> Recall that the eigenvectors are only defined up to a
    >>> constant: even when the length is specified they are
    >>> still only defined up to a scalar of modulus one (the
    >>> sign for real matrices).

    >>
    >> It's not a warning but a "recall that" .. maybe because
    >> the author already assumed that only thorough users would
    >> read that and for them it would be a recall of something
    >> they'd have learned *and* not entirely forgotten since
    >> ;-)
    >>

    > I don't think you're really being fair here: the text in
    > ?eigen doesn't make clear that eigenvector values are not
    > reproducible even within the same version of R, and
    > there's nothing in ?mvrnorm to suggest it doesn't give
    > reproducible results.

Ok, I'm sorry ... I definitely did not want to be unfair.

I've always thought the remark in eigen was sufficient,  but I'm
probably wrong and we should add text explaining that it
practically means that eigenvectors are only defined up to sign
switches (in the real case) and hence results depend on the
underlying {Lapack + BLAS} libraries and therefore are platform
dependent.

Even further, we could consider (optionally, by default FALSE)
using defining a deterministic scheme for postprocessing the current
output of eigen such that at least for the good cases where all
eigenspaces are 1-dimensional, the postprocessing would result
in reproducible signs, by e.g., ensuring the first non-zero
entry of each eigenvector to be positive.

MASS::mvrnorm()  and  mvtnorm::rmvnorm() both use "eigen",
whereas mvtnorm::rmvnorm()  *does* have  method = "chol" which
AFAIK does not suffer from such problems.

OTOH, the help page of MASS::mvrnorm() mentions the Cholesky
alternative but prefers eigen for better stability (without
saying more).

In spite of that, my personal recommendation would be to use

  mvtnorm::rmvnorm(.., method = "chol")

{ or the 2-3 lines of R code to the same thing without an extra

package,

  just using rnorm(), chol() and simple matrix operations }

because in simulations I'd expect the var-cov matrix  Sigma to
be far enough away from singular for chol() to be stable.

Martin

______________________________________________
R-package-devel at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel

______________________________________________
R-package-devel at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel

Thread (13 messages)

Kevin Coombes mvrnorm, eigen, tests, and R CMD check May 17 Martin Maechler mvrnorm, eigen, tests, and R CMD check May 17 Ben Bolker mvrnorm, eigen, tests, and R CMD check May 17 Duncan Murdoch mvrnorm, eigen, tests, and R CMD check May 17 Martin Maechler mvrnorm, eigen, tests, and R CMD check May 17 William Dunlap mvrnorm, eigen, tests, and R CMD check May 17 Kevin Coombes mvrnorm, eigen, tests, and R CMD check May 17 Ben Bolker mvrnorm, eigen, tests, and R CMD check May 17 Kevin Coombes mvrnorm, eigen, tests, and R CMD check May 17 William Dunlap mvrnorm, eigen, tests, and R CMD check May 17 Jari Oksanen mvrnorm, eigen, tests, and R CMD check May 17 Martin Maechler mvrnorm, eigen, tests, and R CMD check May 18 Facundo Muñoz mvrnorm, eigen, tests, and R CMD check May 18