Skip to content

Size of type double in object type dist (PR#1255)

2 messages · Peter Kleiweg

#
The following problem occurs in R 1.4.0 and 1.3.1 for Windows95,
but not in R 1.2.0 for Windows95.

The problem does not occur in R 1.4.0 for Linux PC, Linux Alpha
and HP-UX.

Sometimes, the type of 'Size' of an object of type 'dist'
changes from integer into double. Running cmdscale on such a
'dist' object gives invalid results.

I don't know what should be considered a bug, the type of 'Size'
changing into 'double', or cmdscale not able to process a 'dist'
object with a 'Size' of type 'double'. Perhaps both.

I have not been able to locate a single cause of this problem.
The following example is when it goes wrong:

    # PA is an object of class 'dist' with 'Size' 67
    > data(PA)
    > attr(PA, "Size")
    [1] 67
    >
    > MDS <- function (dif, dim)
    + {
    +     a <- cmdscale(dif, k = dim, eig = FALSE)
    +     row.names(a) <- names(dif)
    +     a
    + }
    >
    > vec <- MDS(PA, 2)
    > attr(PA, "Size")
    [1] 67
    > plot(vec, type="n", xlab="", ylab="")
    > attr(PA, "Size")
    [1] 67
    > text(vec, rownames(vec), cex=0.5)
    > attr(PA, "Size")
    [1] 67
    >
    > vec8 <- MDS(PA, 8)
    > attr(PA, "Size")
    [1] 67
    > d2 <- dist(vec)
    > attr(PA, "Size"); attr(d2, "Size")
    [1] 67
    [1] 67
    > d8 <- dist(vec8)
    > attr(PA, "Size"); attr(d2, "Size"); attr(d8, "Size")
    [1] 67
    [1] 67
    [1] 67
    > cor(PA, d2)
    [1] 0.9370243
    > attr(PA, "Size"); attr(d2, "Size"); attr(d8, "Size")
    [1] 67
    [1] 67
    [1] 67
    > cor(PA, d8)
    [1] 0.9773668
    > attr(PA, "Size"); attr(d2, "Size"); attr(d8, "Size")
    [1] 67
    [1] 67
    [1] 67
    >
    > plot(c(min(PA), max(PA)), c(min(d2, d8), max(d2, d8)),
    +      type="n", xlab="PA", ylab="dist( cmdscale( PA ) )")
    > attr(PA, "Size"); attr(d2, "Size"); attr(d8, "Size")
    [1] 67
    [1] 67
    [1] 67
    > points(PA, d2, pch=20, cex=0.5, col="black")
# here is when it changes
    > attr(PA, "Size"); attr(d2, "Size"); attr(d8, "Size")
    [1] 67.00000
    [1] 67
    [1] 67
    > points(PA, d8, pch=20, cex=0.5, col="blue")
    > attr(PA, "Size"); attr(d2, "Size"); attr(d8, "Size")
    [1] 67.00000
    [1] 67
    [1] 67
    > legend(min(PA),max(c(d2,d8)),c("dim = 2", "dim = 8"), fill=c("black", "blue"))
    > attr(PA, "Size"); attr(d2, "Size"); attr(d8, "Size")
    [1] 67.00000
    [1] 67
    [1] 67

It gets weirder...

    > rm(PA)
    > data(PA)
    > attr(PA, "Size")
    [1] 67.00000

If I do the following, the 'Size' attributes becomes an integer,
and seems to stay an integer:

    > PA <- as.dist(PA)

...however, running cmdscale on it produces nonsense

--please do not edit the information below--

Version:
 platform = i386-pc-mingw32
 arch = x86
 os = Win32
 system = x86, Win32
 status =
 major = 1
 minor = 4.0
 year = 2001
 month = 12
 day = 19
 language = R

Windows 95 4.0 (build 1111)  B

Search Path:
 .GlobalEnv, package:ctest, Autoloads, package:base


-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-devel mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-devel-request@stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
#
kleiweg@let.rug.nl scribeva...
I have done some further testing. It seems the storage mode of
'Size' was double from the start. Setting the storage mode to
integer does not fix the problem. (However, the problems occur
once attr(PA, "Size") no longer prints 67, but 67.00000)

There seems to be something wrong with the interaction of
the 'plot' function with objects of class 'dist' and
the 'cmdscale' function. I traced cmdscale:

function (d, k = 2, eig = FALSE)
{
[snip]

    storage.mode(x) <- "double"
    Tmat <- -0.5 * .C("dblcen", x, as.integer(n), PACKAGE = "mva")[[1]]

Up to this point, all goes well. The results in Tmat seem OK.
When things go wrong, the following command take a long time to
run:

    e <- La.eigen(Tmat, symmetric = TRUE)

[snip]

}

My best guess is that there is some memory problem (overflow,
null pointer assignment, whatever) when plot is applied to an
object of class 'dist', which causes another call to cmdscale on
the same object to go wrong.

If someone wants to try this, the problem occured running the
examples of the functions 'MDS', 'ISOMDS', and 'SAMMON' in the
'iLeven' package. That package can be downloaded from:

    http://www.let.rug.nl/~kleiweg/levenshtein/R/

Just running one example (any example) once works fine. Running
that example again, or running any of the other examples fails.

R versions 1.3.1 and 1.4.0 / Windows95