Skip to content

Package digest broken under R v2.4.0 devel

3 messages · Henrik Bengtsson, Dirk Eddelbuettel

#
[cc:ing to the maintainer of digest]

FYI, package 'digest' (v0.2.1 2005/11/04 04:45:53) generates the same
output regardless of input with R v2.4.0 devel (2006-07-25 r38698).
Starting a vanilla R session you get:
[1] "3416a75f4cea9109507cacd8e2f2aefc"
[1] "3416a75f4cea9109507cacd8e2f2aefc"
[1] "3416a75f4cea9109507cacd8e2f2aefc"

It works as expected with R v2.3.1 patched (2006-07-25 r38698):
[1] "577e0eb2f3253fc5a8c4a287f7c10e7f"
[1] "75eb91f4559682af50c21212d0dc013b"

digest() uses serialize() internally, but it has nothing to do with
that.  I managed to track it down to the call to .Call("digest", ...).

BTW, thanks for a very useful package.

Henrik
#
Found the reason for the bug.  Patch available online;

  source("http://www.braju.com/R/patches/digest.R")

In digest() the .Call() statement takes the serialized objected
converted to a string as its second argument;

    val <- .Call("digest", as.character(object), as.integer(algoint),
        as.integer(length), PACKAGE = "digest")

This relies on the fact that 'object' is a single character string not
a vector. Try object <- "a" and object <- c("a", "b"), and you'll get
the same result.

To generate the 'object' string, digest() calls serialize() before.
Now, in R v2.3.1 serialize(input, connect=NULL, ascii=TRUE) returns a
single string, but in R v2.4.0 it returns a raw vector.  This is [of
course ;)] document:
For 'serialize', 'NULL' unless 'connection=NULL', when the result is
stored in the first element of a character vector (but is not a normal
character string unless 'ascii = TRUE'
For serialize, NULL unless connection=NULL, when the result is stored
in a raw vector.

So the quick a dirty fix of digest() is to do:

 object <- serialize(object, connection=NULL, ascii=TRUE)
 object <- paste(object, collapse="")

This should work in either R version.  I've made this patch available
online. Just call:

  source("http://www.braju.com/R/patches/digest.R")

Its possible that it is faster to serialize to a 'textConnection'.
However, it might be even faster if your internal code, i.e.
.Call("digest", ...), accepts vectors so this does not have to be done
at the R level?

Cheers

Henrik
On 7/27/06, Henrik Bengtsson <hb at stat.berkeley.edu> wrote:
#
On 28 July 2006 at 08:52, Henrik Bengtsson wrote:
| Found the reason for the bug.  Patch available online;
| 
|   source("http://www.braju.com/R/patches/digest.R")

Splendid -- thank you for both the bug report, and the patch.  The new
revision digest_0.2.2 includes this patch. I also added unit tests in
directory tests/. The new version is now in CRAN's incoming/ directory.

Thanks, Dirk