Skip to content

Hashed environments of size <5 never grow

2 messages · Duncan Garmonsway, Martin Maechler

#
Hello,

Hashed environments that begin with a (non-default) size of 4 or less, will
never grow, which is very detrimental to performance.  For example,

```
n <- 10000
l <- vector("list", n)
l <- setNames(l, seq_len(n))

# Takes a second, and nchains remains 1.
e1 <- list2env(l, hash = TRUE, size = 1)
env.profile(e1)$nchains
# [1] 1

# Returns instantly, and nchains grows to 6950
e2 <- list2env(l, hash = TRUE, size = 5)
env.profile(e2)$nchains
# [1] 6950
```

The cause is that, when calling the growth function, the new size is
truncated to an integer.  See src/main/envir.c line 440, or
https://github.com/wch/r-source/blob/d9b9d00b6d2764839f229bf011dda8d027aae227/src/main/envir.c#L440

Given the hard-coded growth rate of 1.2, any size of 4 or less will be
truncated back to itself.

(int) (1 * 1.2 ) = 1
(int) (2 * 1.2) = 1
(int) (3 * 1.2) = 1
(int) (4 * 1.2) = 1
(int) (5 * 1.2) = 6

This is a rare case, and I couldn't find any examples in CRAN packages of
the `size` argument being used at all, let alone so small.  Even so, it
tripped me up, and could be fixed by using `ceil()` in src/main/envir.c
line 440 as follows.

new_table = R_NewHashTable((int)(ceil(HASHSIZE(table) *
HASHTABLEGROWTHRATE)))

Kind regards,
Duncan Garmonsway
2 days later
#
> Hello,

    > Hashed environments that begin with a (non-default) size
    > of 4 or less, will never grow, which is very detrimental
    > to performance.  For example,

    > ```
    > n <- 10000
    > l <- vector("list", n)
    > l <- setNames(l, seq_len(n))

    > # Takes a second, and nchains remains 1.
    > e1 <- list2env(l, hash = TRUE, size = 1)
    > env.profile(e1)$nchains
    > # [1] 1

    > # Returns instantly, and nchains grows to 6950
    > e2 <- list2env(l, hash = TRUE, size = 5)
    > env.profile(e2)$nchains
    > # [1] 6950
    > ```

    > The cause is that, when calling the growth function, the new size is
    > truncated to an integer.  See src/main/envir.c line 440, or
    > https://github.com/wch/r-source/blob/d9b9d00b6d2764839f229bf011dda8d027aae227/src/main/envir.c#L440

    > Given the hard-coded growth rate of 1.2, any size of 4 or less will be
    > truncated back to itself.

    > (int) (1 * 1.2 ) = 1
    > (int) (2 * 1.2) = 1
    > (int) (3 * 1.2) = 1
    > (int) (4 * 1.2) = 1
    > (int) (5 * 1.2) = 6

Yes.  I'm convinced this has been oversight and should be
corrected.

    > This is a rare case, and I couldn't find any examples in CRAN packages of
    > the `size` argument being used at all, let alone so small.  Even so, it
    > tripped me up, and could be fixed by using `ceil()` in src/main/envir.c
    > line 440 as follows.

    > new_table = R_NewHashTable((int)(ceil(HASHSIZE(table) *
    > HASHTABLEGROWTHRATE)))

Indeed, this bug would surface very very rarely,
but I agree a fix such as your proposition  is appropriate.

I'll do so, also adding a regression test.

Martin Maechler
ETH Zurich   and  R Core team


    > Kind regards,
    > Duncan Garmonsway