Skip to content

Covariance bug in R-1.8.0

2 messages · lederer@trium.de, Peter Dalgaard

#
R-1.8.0 seems to calculate wrong covariances, when the argument of cov()
is a matrix or a data frame.
The following should produce a matrix of zeroes and NaNs:

x <- matrix(c(NA ,NA ,0.9068995 ,NA ,-0.3116229,
              -0.06011117 ,0.7310134 ,NA ,1.738362 ,0.6276125,
              0.6615581 ,NA ,NA ,-2.646011 ,-2.126105,
              NA ,1.081825 ,NA ,1.253795 ,1.520708,
              0.2822814 ,NA ,NA ,NA ,NA,
              0.03291028 ,NA ,NA ,NA ,NA,
              NA ,NA ,NA ,-0.5462126 ,-0.1997394,
              NA ,-0.3419413 ,-0.2675226 ,-1.000133 ,-0.1346234,
              NA ,NA ,-0.411743 ,1.301612 ,NA,
              0.922197 ,NA ,0.9513522 ,0.2357021 ,NA),
            nrow=10, ncol=5)

c1 <- cov(x, use="pairwise.complete")

c2 <- matrix(nrow=5, ncol=5)
for (i in 1:5)
{
    for (j in 1:5)
    {
        c2[i,j] <- cov(x[,i], x[,j], use="pairwise.complete")
    }
}

c2-c1

Instead, R-1.8.0 produces this result:

            [,1]        [,2]       [,3]          [,4]        [,5]
[1,]  0.00000000 -0.03053828         NA -0.0144996353 -0.03485883
[2,] -0.03053828 -0.01649857         NA  0.0137259383 -0.02960707
[3,]          NA          NA -0.1296134            NA          NA
[4,] -0.01449964  0.01372594         NA -0.0003152629  0.08717648
[5,] -0.03485883 -0.02960707         NA  0.0871764791  0.04961190

This happens as well under Linux (Suse 9.1) as well as under Windows NT.

Under 1.9.1 (Linux) and 1.9.0 (Windows) i get the expected matrix of
zeroes and NaNs.

This example is not very special. Under R-1.8.0 cov produced wrong result
for any random matrix i tried.

Doesn't this mean, that *any* result obtained under R 1.8.0 is unreliable?

By the way, i just recompiled R-1.8.0 from source under Linux and tried
'make check'. All tests were ok.
Does there exist a more detailed set of tests, which could insure that
at least the most basic R functions work correctly?


Christian
#
lederer at trium.de writes:
...
Presumably, this is the same as PR#4646.
It means that covariances and correlations are sometimes computed
incorrectly.
Yes. We don't release versions that don't pass their own tests.
We add regression tests as we discover and fix bugs. We can't fix old
versions retroactively though, we release patch versions (e.g. 1.8.1)
instead.