Improvement: function cut
Hello Andrew, I add this info as a completion (so other users can get a better understanding): If we want to perform a survival analysis, than the interval should be closed to the right, but we should include also the first time point (as per Intention-to-Treat): [0, 4](4, 8](8, 12](12, 16] [0, 4](4, 8](8, 12](12, 16](16, 20] So the series is extendible to the right without any errors! But the 1st interval (which is the same in both series) is different from the other intervals: [0, 4]. I feel that this should have been the default behaviour for cut(). Note: I was induced to think about a different situation in my previous message, as you constructed open intervals on the right, and also extended to the right. But survival analysis should be as described in this mail and should probably be the default. Sincerely, Leonard
On 9/18/2021 1:29 AM, Andrew Simmons wrote:
I disagree, I don't really think it's too long or ugly, but if you think it is, you could abbreviate it as 'i'. x <- 0:20 breaks1 <- seq.int <http://seq.int>(0, 16, 4) breaks2 <- seq.int <http://seq.int>(0, 20, 4) data.frame( ? ? cut(x, breaks1, right = FALSE, i = TRUE), ? ? cut(x, breaks2, right = FALSE, i = TRUE), ? ? check.names = FALSE ) I hope this helps. On Fri, Sep 17, 2021 at 6:26 PM Leonard Mada <leo.mada at syonic.eu <mailto:leo.mada at syonic.eu>> wrote: Hello Andrew, But "cut" generates factors. In most cases with real data one expects to have also the ends of the interval: the argument "include.lowest" is both ugly and too long. [The test-code on the ftable thread contains this error! I have run through this error a couple of times.] The only real situation that I can imagine to be problematic: - if the interval goes to +Inf (or -Inf): I do not know if there would be any effects when including +Inf (or -Inf). Leonard On 9/18/2021 1:14 AM, Andrew Simmons wrote:
While it is not explicitly mentioned anywhere in the
documentation for .bincode, I suspect 'include.lowest = FALSE' is
the default to keep the definitions of the bins consistent. For
example:
x <- 0:20
breaks1 <- seq.int <http://seq.int>(0, 16, 4)
breaks2 <- seq.int <http://seq.int>(0, 20, 4)
cbind(
? ? .bincode(x, breaks1, right = FALSE, include.lowest = TRUE),
? ? .bincode(x, breaks2, right = FALSE, include.lowest = TRUE)
)
by having 'include.lowest = TRUE' with different ends, you can
get inconsistent behaviour. While this probably wouldn't be an
issue with 'real' data, this would seem like something you'd want
to avoid by default. The definitions of the bins are
[0, 4)
[4, 8)
[8, 12)
[12, 16]
and
[0, 4)
[4, 8)
[8, 12)
[12, 16)
[16, 20]
so you can see where the inconsistent behaviour comes from. You
might be able to get R-core to add argument 'warn', but probably
not to change the default of 'include.lowest'. I hope this helps
On Fri, Sep 17, 2021 at 6:01 PM Leonard Mada <leo.mada at syonic.eu
<mailto:leo.mada at syonic.eu>> wrote:
Thank you Andrew.
Is there any reason not to make: include.lowest = TRUE the
default?
Regarding the NA:
The user still has to suspect that some values were not
included and run that test.
Leonard
On 9/18/2021 12:53 AM, Andrew Simmons wrote:
Regarding your first point, argument 'include.lowest'
already handles this specific case, see ?.bincode
Your second point, maybe it could be helpful, but since both
'cut.default' and '.bincode' return NA if a value isn't
within a bin, you could make something like this on your own.
Might be worth pitching to R-bugs on the wishlist.
On Fri, Sep 17, 2021, 17:45 Leonard Mada via R-help
<r-help at r-project.org <mailto:r-help at r-project.org>> wrote:
Hello List members,
the following improvements would be useful for function
cut (and .bincode):
1.) Argument: Include extremes
extremes = TRUE
if(right == FALSE) {
??? # include also right for last interval;
} else {
??? # include also left for first interval;
}
2.) Argument: warn = TRUE
Warn if any values are not included in the intervals.
Motivation:
- reduce risk of errors when using function cut();
Sincerely,
Leonard
______________________________________________
R-help at r-project.org <mailto:R-help at r-project.org>
mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
<https://stat.ethz.ch/mailman/listinfo/r-help>
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
<http://www.R-project.org/posting-guide.html>
and provide commented, minimal, self-contained,
reproducible code.