Skip to content
Prev 389244 / 398506 Next

Improvement: function cut

While it is not explicitly mentioned anywhere in the documentation for
.bincode, I suspect 'include.lowest = FALSE' is the default to keep the
definitions of the bins consistent. For example:


x <- 0:20
breaks1 <- seq.int(0, 16, 4)
breaks2 <- seq.int(0, 20, 4)
cbind(
    .bincode(x, breaks1, right = FALSE, include.lowest = TRUE),
    .bincode(x, breaks2, right = FALSE, include.lowest = TRUE)
)


by having 'include.lowest = TRUE' with different ends, you can get
inconsistent behaviour. While this probably wouldn't be an issue with
'real' data, this would seem like something you'd want to avoid by default.
The definitions of the bins are


[0, 4)
[4, 8)
[8, 12)
[12, 16]


and


[0, 4)
[4, 8)
[8, 12)
[12, 16)
[16, 20]


so you can see where the inconsistent behaviour comes from. You might be
able to get R-core to add argument 'warn', but probably not to change the
default of 'include.lowest'. I hope this helps
On Fri, Sep 17, 2021 at 6:01 PM Leonard Mada <leo.mada at syonic.eu> wrote: