Hi all, I think I have discovered a bug in the conversion from POSIXlt to Date that has been introduced in r-devel. It affects lubridate, but surprisingly didn't cause test failures there. Instead it caused test failures in users of lubridate, like slider, arrow, and admiral (see https://github.com/tidyverse/lubridate/issues/1069), and at least in slider I have been asked by CRAN to correct this issue before 2022-10-16. In r-devel we get the following: ``` data <- list( sec = 0, min = 0L, hour = 0L, mday = 31L, mon = c(0L, NA, 2L), year = 113L, wday = 4L, yday = 30L, isdst = 0L ) x <- .POSIXlt(xx = data, tz = "UTC") x #> [1] "2013-01-31 UTC" NA "2013-03-31 UTC" # Looks right as.POSIXct(x) #> [1] "2013-01-31 UTC" NA "2013-03-31 UTC" # Weird, where is the `NA`? as.Date(x) #> [1] "2013-01-31" "1970-01-01" "2013-03-31" ``` The POSIXlt object is length 3, but is only partially filled out. The other elements are all recycled to length 3 upon conversion to POSIXct or Date. But when converting to Date, we lose the `NA` value. I think the `as.Date()` conversion seems inconsistent with the `as.POSIXct()` conversion. It looks like this comes up because the conversion to Date now defaults to using `sec` if any of the date-like fields are `NA_INTEGER`, but this means the `NA` in the `mon` field is ignored. https://github.com/wch/r-source/blob/e10a971dee6a0ab851279c183cc21954d66b3be4/src/main/datetime.c#L1293-L1295 Thanks all, Davis Vaughan
A potential POSIXlt->Date bug introduced in r-devel
8 messages · Davis Vaughan, Berwin A Turlach, Brian Ripley +1 more
Davis Vaughan
on Wed, 5 Oct 2022 17:04:11 -0400 writes:
> Hi all,
> I think I have discovered a bug in the conversion from POSIXlt to Date that
> has been introduced in r-devel.
> It affects lubridate, but surprisingly didn't cause test failures there.
> Instead it caused test failures in users of lubridate, like slider, arrow,
> and admiral (see https://github.com/tidyverse/lubridate/issues/1069), and
> at least in slider I have been asked by CRAN to correct this issue before
> 2022-10-16.
> In r-devel we get the following:
> ```
> data <- list(
> sec = 0,
> min = 0L,
> hour = 0L,
> mday = 31L,
> mon = c(0L, NA, 2L),
> year = 113L,
> wday = 4L,
> yday = 30L,
> isdst = 0L
> )
> x <- .POSIXlt(xx = data, tz = "UTC")
> x
> #> [1] "2013-01-31 UTC" NA "2013-03-31 UTC"
> # Looks right
> as.POSIXct(x)
> #> [1] "2013-01-31 UTC" NA "2013-03-31 UTC"
> # Weird, where is the `NA`?
> as.Date(x)
> #> [1] "2013-01-31" "1970-01-01" "2013-03-31"
> ```
I agree that the above is wrong, i.e., a bug in current R-devel.
> The POSIXlt object is length 3, but is only partially filled out.
> The other elements are all recycled to length 3 upon
> conversion to POSIXct or Date.
> But when converting to Date, we lose the `NA` value. I think the
> `as.Date()` conversion seems inconsistent with the `as.POSIXct()`
> conversion.
Yes. There was another very much relatd conversation here on R-devel,
initiated by Suharto Anggono just a few days ago.
This subject, i.e., "partially filled out" POSIXlt objects, was
one of the topics, too.
See my reply there, notably at the end:
https://stat.ethz.ch/pipermail/r-devel/2022-October/082072.html
I do mention that "recycling" of partially filled POSIXlt
objects has only partially been implemented in R more generally
and was actually asking for comments and further discussion.
> It looks like this comes up because the conversion to Date now defaults to
> using `sec` if any of the date-like fields are `NA_INTEGER`,
yes, because only that allows to also deal with +/- Inf etc,
as was recently added as new feature, see the NEWS of R 4.2.0
? Not strictly fixing a bug, format()ing and print()ing of
non-finite Date and POSIXt values NaN and +/-Inf no longer show
as NA but the respective string, e.g., Inf, for consistency with
numeric vector's behaviour, fulfilling the wish of PR#18308.
i.e., see also R's bugzilla
https://bugs.r-project.org/show_bug.cgi?id=18308
which actually *also* mentioned an NA problem in Date/Time objects.
> but this means the `NA` in the `mon` field is ignored.
which I agree is bogous and we'll fix.
Still, I did not get any feedback on asking about documentation
etc on POSIXlt objects ... and I *had* mentioned I agreed that
the current partial implementation of "partially filled" i.e. recycling of
POSIXlt components should probably be made part of the
"definition" of POSIXlt.
Have I overlooked an existing definition / contract about these?
Martin
--
Martin M?chler
ETH Zurich and R Core team
G'day all, On Thu, 6 Oct 2022 10:15:29 +0200
Martin Maechler <maechler at stat.math.ethz.ch> wrote:
Davis Vaughan
on Wed, 5 Oct 2022 17:04:11 -0400 writes:
> # Weird, where is the `NA`?
> as.Date(x)
> #> [1] "2013-01-31" "1970-01-01" "2013-03-31"
> ```
I agree that the above is wrong, i.e., a bug in current R-devel.
I have no intention of hijacking this thread, but I wonder whether this is a good opportunity to mention that the 32 bit build of R-devel falls over on my machine since 25 September. It fails one of the regression tests in reg-tests-1d.R. The final lines of reg-tests-1d.Rout.fail are:
tools::Rd2txt(rd, out <- textConnection(NULL, "w"), fragment = TRUE) stopifnot(any(as.character(rd) != "\n"),
+ identical(textConnectionValue(out)[2L], "LaTeX")); close(out)
## empty output in R <= 4.2.x ## as.POSIXlt(<very large Date>) gave integer overflow stopifnot(as.POSIXlt(.Date(2^31 + 10))$year == 5879680L)
Error: as.POSIXlt(.Date(2^31 + 10))$year == 5879680L is not TRUE Execution halted I should have reported this earlier, but somehow did not find the time to do so. So I thought I mention it here. :) Cheers, Berwin
On 06/10/2022 09:41, Berwin A Turlach wrote:
G'day all, On Thu, 6 Oct 2022 10:15:29 +0200 Martin Maechler <maechler at stat.math.ethz.ch> wrote:
Davis Vaughan
on Wed, 5 Oct 2022 17:04:11 -0400 writes:
> # Weird, where is the `NA`?
> as.Date(x)
> #> [1] "2013-01-31" "1970-01-01" "2013-03-31"
> ```
I agree that the above is wrong, i.e., a bug in current R-devel.
I have no intention of hijacking this thread, but I wonder whether this is a good opportunity to mention that the 32 bit build of R-devel falls over on my machine since 25 September. It fails one of the regression tests in reg-tests-1d.R. The final lines of reg-tests-1d.Rout.fail are:
tools::Rd2txt(rd, out <- textConnection(NULL, "w"), fragment = TRUE) stopifnot(any(as.character(rd) != "\n"),
+ identical(textConnectionValue(out)[2L], "LaTeX")); close(out)
## empty output in R <= 4.2.x
Yes, known for a few days on the R-core list. I am in the middle of an OS upgrade on that machine and won't have time to do more than report until that (and all the re-building and re-checking) is complete.
## as.POSIXlt(<very large Date>) gave integer overflow stopifnot(as.POSIXlt(.Date(2^31 + 10))$year == 5879680L)
Error: as.POSIXlt(.Date(2^31 + 10))$year == 5879680L is not TRUE Execution halted I should have reported this earlier, but somehow did not find the time to do so. So I thought I mention it here. :) Cheers, Berwin
______________________________________________ R-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Brian D. Ripley, ripley at stats.ox.ac.uk Emeritus Professor of Applied Statistics, University of Oxford
1 day later
Martin Maechler
on Thu, 6 Oct 2022 10:15:29 +0200 writes:
Davis Vaughan
on Wed, 5 Oct 2022 17:04:11 -0400 writes:
>> Hi all,
>> I think I have discovered a bug in the conversion from POSIXlt to Date that
>> has been introduced in r-devel.
>> It affects lubridate, but surprisingly didn't cause test failures there.
>> Instead it caused test failures in users of lubridate, like slider, arrow,
>> and admiral (see https://github.com/tidyverse/lubridate/issues/1069), and
>> at least in slider I have been asked by CRAN to correct this issue before
>> 2022-10-16.
>> In r-devel we get the following:
>> ```
>> data <- list(
>> sec = 0,
>> min = 0L,
>> hour = 0L,
>> mday = 31L,
>> mon = c(0L, NA, 2L),
>> year = 113L,
>> wday = 4L,
>> yday = 30L,
>> isdst = 0L
>> )
>> x <- .POSIXlt(xx = data, tz = "UTC")
>> x
>> #> [1] "2013-01-31 UTC" NA "2013-03-31 UTC"
>> # Looks right
>> as.POSIXct(x)
>> #> [1] "2013-01-31 UTC" NA "2013-03-31 UTC"
>> # Weird, where is the `NA`?
>> as.Date(x)
>> #> [1] "2013-01-31" "1970-01-01" "2013-03-31"
>> ```
> I agree that the above is wrong, i.e., a bug in current R-devel.
>> The POSIXlt object is length 3, but is only partially filled out.
>> The other elements are all recycled to length 3 upon
>> conversion to POSIXct or Date.
>> But when converting to Date, we lose the `NA` value. I think the
>> `as.Date()` conversion seems inconsistent with the `as.POSIXct()`
>> conversion.
> Yes. There was another very much relatd conversation here on R-devel,
> initiated by Suharto Anggono just a few days ago.
> This subject, i.e., "partially filled out" POSIXlt objects, was
> one of the topics, too.
> See my reply there, notably at the end:
> https://stat.ethz.ch/pipermail/r-devel/2022-October/082072.html
> I do mention that "recycling" of partially filled POSIXlt
> objects has only partially been implemented in R more generally
> and was actually asking for comments and further discussion.
>> It looks like this comes up because the conversion to Date now defaults to
>> using `sec` if any of the date-like fields are `NA_INTEGER`,
> yes, because only that allows to also deal with +/- Inf etc,
> as was recently added as new feature, see the NEWS of R 4.2.0
> ? Not strictly fixing a bug, format()ing and print()ing of
> non-finite Date and POSIXt values NaN and +/-Inf no longer show
> as NA but the respective string, e.g., Inf, for consistency with
> numeric vector's behaviour, fulfilling the wish of PR#18308.
> i.e., see also R's bugzilla
> https://bugs.r-project.org/show_bug.cgi?id=18308
> which actually *also* mentioned an NA problem in Date/Time objects.
>> but this means the `NA` in the `mon` field is ignored.
> which I agree is bogous and we'll fix.
> Still, I did not get any feedback on asking about documentation
> etc on POSIXlt objects ... and I *had* mentioned I agreed that
> the current partial implementation of "partially filled" i.e. recycling of
> POSIXlt components should probably be made part of the
> "definition" of POSIXlt.
> Have I overlooked an existing definition / contract about these?
> Martin
I'm still waiting for comments.
Note that "partially filled" POSIXlt do not work correctly in
any version of R. I mentioned that even length(.) may easily
fail; but there is much more.
While I can relatively easily fix Davis' case above,
the following example behaves wrongly in current and previous
released versions of R and in R-devel:
dlt <- .POSIXlt(list(sec = c(-999, 10000 + c(1:10,-Inf, NA)) + pi,
# "out of range", non-finite, fractions
min = 45L, hour = c(21L, 3L, NA, 4L),
mday = 6L, mon = c(11L, NA, 3L),
year = 116L, wday = 2L, yday = 340L, isdst = 1L))
Of course that's constructed to be particularly unpleasant.
You can try some of the following "checks" in your version(s) of
R to see that some of the things are misbehaving with it in all (*)
versions of R.
--
*) so I claim boldly
dct <- as.POSIXct(dlt)
(n <- length(dct))
dD <- as.Date(dlt)
dDc <- as.Date(dct)
dltN <- as.POSIXlt(dct) # "normalized POSIXlt" (with *lost* accuracy):
data.frame(unclass(dltN))
.POSIXltNormalize <- function(x) {
stopifnot(is.numeric(s <- x$sec))
x <- as.POSIXlt(as.POSIXct(x)) # and restore the precise seconds :
ifin <- is.finite(s) & is.finite(x$sec) # (maybe recycling already)
x$sec[ifin] <- s[ifin] %% 60
x
}
dlt2 <- .POSIXltNormalize(dlt) # normalized POSIXlt - with accuracy kept
all.equal(dlt2$sec, dltN$sec, tolerance = 0) # .. small (2e-9) difference
stopifnot(all.equal(dlt2, dltN),
identical(as.POSIXct(dlt2), as.POSIXct(dltN)))
## First show (in a way it also works for older R), then check :
oldR <- getRversion() < "4.2.2"
print(width = 101,
data.frame(dlt, dltN, asCT = dct, asDateCT = dDc,
asDate = if(oldR) rep_len(dD, n) else dD ,
na = is.na(dlt),
fin = if(oldR) rep_len(NA, n) else is.finite(dlt))
)
--
Martin M?chler
ETH Zurich and R Core team
Martin, FWIW, I scoured the docs using GitHub's new code search preview but can't seem to find any reference to the fact that POSIXlt fields are internally recycled (even though lubridate seems to have been relying on this for quite some time). -Davis On Fri, Oct 7, 2022 at 8:52 AM Martin Maechler <maechler at stat.math.ethz.ch> wrote:
Martin Maechler
on Thu, 6 Oct 2022 10:15:29 +0200 writes:
Davis Vaughan
on Wed, 5 Oct 2022 17:04:11 -0400 writes:
>> Hi all,
>> I think I have discovered a bug in the conversion from POSIXlt to
Date that
>> has been introduced in r-devel.
>> It affects lubridate, but surprisingly didn't cause test failures
there.
>> Instead it caused test failures in users of lubridate, like slider,
arrow,
>> and admiral (see https://github.com/tidyverse/lubridate/issues/1069),
and
>> at least in slider I have been asked by CRAN to correct this issue
before
>> 2022-10-16.
>> In r-devel we get the following:
>> ```
>> data <- list(
>> sec = 0,
>> min = 0L,
>> hour = 0L,
>> mday = 31L,
>> mon = c(0L, NA, 2L),
>> year = 113L,
>> wday = 4L,
>> yday = 30L,
>> isdst = 0L
>> )
>> x <- .POSIXlt(xx = data, tz = "UTC")
>> x
>> #> [1] "2013-01-31 UTC" NA "2013-03-31 UTC"
>> # Looks right
>> as.POSIXct(x)
>> #> [1] "2013-01-31 UTC" NA "2013-03-31 UTC"
>> # Weird, where is the `NA`?
>> as.Date(x)
>> #> [1] "2013-01-31" "1970-01-01" "2013-03-31"
>> ```
> I agree that the above is wrong, i.e., a bug in current R-devel.
>> The POSIXlt object is length 3, but is only partially filled out.
>> The other elements are all recycled to length 3 upon
>> conversion to POSIXct or Date.
>> But when converting to Date, we lose the `NA` value. I think the
>> `as.Date()` conversion seems inconsistent with the `as.POSIXct()`
>> conversion.
> Yes. There was another very much relatd conversation here on
R-devel,
> initiated by Suharto Anggono just a few days ago.
> This subject, i.e., "partially filled out" POSIXlt objects, was
> one of the topics, too.
> See my reply there, notably at the end:
> I do mention that "recycling" of partially filled POSIXlt
> objects has only partially been implemented in R more generally
> and was actually asking for comments and further discussion.
>> It looks like this comes up because the conversion to Date now
defaults to
>> using `sec` if any of the date-like fields are `NA_INTEGER`,
> yes, because only that allows to also deal with +/- Inf etc,
> as was recently added as new feature, see the NEWS of R 4.2.0
> ? Not strictly fixing a bug, format()ing and print()ing of
> non-finite Date and POSIXt values NaN and +/-Inf no longer show
> as NA but the respective string, e.g., Inf, for consistency with
> numeric vector's behaviour, fulfilling the wish of PR#18308.
> i.e., see also R's bugzilla
> https://bugs.r-project.org/show_bug.cgi?id=18308
> which actually *also* mentioned an NA problem in Date/Time objects.
>> but this means the `NA` in the `mon` field is ignored.
> which I agree is bogous and we'll fix.
> Still, I did not get any feedback on asking about documentation
> etc on POSIXlt objects ... and I *had* mentioned I agreed that
> the current partial implementation of "partially filled" i.e.
recycling of
> POSIXlt components should probably be made part of the
> "definition" of POSIXlt.
> Have I overlooked an existing definition / contract about these?
> Martin
I'm still waiting for comments.
Note that "partially filled" POSIXlt do not work correctly in
any version of R. I mentioned that even length(.) may easily
fail; but there is much more.
While I can relatively easily fix Davis' case above,
the following example behaves wrongly in current and previous
released versions of R and in R-devel:
dlt <- .POSIXlt(list(sec = c(-999, 10000 + c(1:10,-Inf, NA)) + pi,
# "out of range", non-finite,
fractions
min = 45L, hour = c(21L, 3L, NA, 4L),
mday = 6L, mon = c(11L, NA, 3L),
year = 116L, wday = 2L, yday = 340L, isdst = 1L))
Of course that's constructed to be particularly unpleasant.
You can try some of the following "checks" in your version(s) of
R to see that some of the things are misbehaving with it in all (*)
versions of R.
--
*) so I claim boldly
dct <- as.POSIXct(dlt)
(n <- length(dct))
dD <- as.Date(dlt)
dDc <- as.Date(dct)
dltN <- as.POSIXlt(dct) # "normalized POSIXlt" (with *lost* accuracy):
data.frame(unclass(dltN))
.POSIXltNormalize <- function(x) {
stopifnot(is.numeric(s <- x$sec))
x <- as.POSIXlt(as.POSIXct(x)) # and restore the precise seconds :
ifin <- is.finite(s) & is.finite(x$sec) # (maybe recycling already)
x$sec[ifin] <- s[ifin] %% 60
x
}
dlt2 <- .POSIXltNormalize(dlt) # normalized POSIXlt - with accuracy kept
all.equal(dlt2$sec, dltN$sec, tolerance = 0) # .. small (2e-9) difference
stopifnot(all.equal(dlt2, dltN),
identical(as.POSIXct(dlt2), as.POSIXct(dltN)))
## First show (in a way it also works for older R), then check :
oldR <- getRversion() < "4.2.2"
print(width = 101,
data.frame(dlt, dltN, asCT = dct, asDateCT = dDc,
asDate = if(oldR) rep_len(dD, n) else dD ,
na = is.na(dlt),
fin = if(oldR) rep_len(NA, n) else is.finite(dlt))
)
--
Martin M?chler
ETH Zurich and R Core team
3 days later
Davis Vaughan
on Fri, 7 Oct 2022 11:00:23 -0400 writes:
> Martin,
> FWIW, I scoured the docs using GitHub's new code search preview but can't
> seem to find any reference to the fact that POSIXlt fields are internally
> recycled (even though lubridate seems to have been relying on this for
> quite some time).
Thank you, Davis, for searching and confirming that this has
never been documented. And so, in that sense there's no bug,
because only non-documented behaviour has changed.
Still, with svn rev 83062 I have finally committed a version of
the C code underlying as.Date.POSIXlt() which (I think/hope)
does deal correctly with such partially filled POSIXlt objects
(it does with your relatively benign example and with the more
nasty one I had mentioned).
BTW, at least length() has been working correctly for
these, for a bit less than a day now, too, since svn rev 83056 .
These are still only some of several steps to correctly support
such objects.
Notably the `[` and `[<-` methods will need a substantial
update, too (as you have noticed for the latter in your other
R-devel post on Oct 6, Subject "Bug with `[<-.POSIXlt` on specific OSes").
Note that I'd be quite grateful for well documented platform
dependent behavior. In addition to a typical sessionInfo() this
would also need to report the current setting of a "TZ" environment variable,
where I have noticed [on Linux, Fedora 36] that there may be
different behavior between TZ entirely undefined and a TZ="" setting
(which I intuitively assumed would work equivalently).
Best,
Martin
> -Davis
> On Fri, Oct 7, 2022 at 8:52 AM Martin Maechler <maechler at stat.math.ethz.ch>
> wrote:
>> >>>>> Martin Maechler
>> >>>>> on Thu, 6 Oct 2022 10:15:29 +0200 writes:
>>
>> >>>>> Davis Vaughan
>> >>>>> on Wed, 5 Oct 2022 17:04:11 -0400 writes:
>>
>> >> Hi all,
>>
>> >> I think I have discovered a bug in the conversion from POSIXlt to
>> Date that has been introduced in r-devel.
>>
>> >> It affects lubridate, but surprisingly didn't cause test failures
>> there.
[..................]
1 day later
Confirmed on Fedora 36 which has a 32-bit time_t for an i686 compile. I was a bit surprised that has not been changed, but gather Linux distros are preferring to drop ix86 than fix it. There is a simple workaround, to configure R with --with-internal-tzcode, which always uses a 64-bit time_t. Given that 2038 is not that far away, avoiding 32-bit time_t is generally a very good idea (not just for people working with dates in 5881580!). That test should not really be run on platforms with 32-bit time_t, but that is not currently known at R level.
On 06/10/2022 13:38, Prof Brian Ripley wrote:
On 06/10/2022 09:41, Berwin A Turlach wrote:
G'day all, On Thu, 6 Oct 2022 10:15:29 +0200 Martin Maechler <maechler at stat.math.ethz.ch> wrote:
Davis Vaughan ???? on Wed, 5 Oct 2022 17:04:11 -0400 writes:
???? > # Weird, where is the `NA`? ???? > as.Date(x) ???? > #> [1] "2013-01-31" "1970-01-01" "2013-03-31" ???? > ``` I agree that the above is wrong, i.e., a bug in current? R-devel.
I have no intention of hijacking this thread, but I wonder whether this is a good opportunity to mention that the 32 bit build of R-devel falls over on my machine since 25 September.? It fails one of the regression tests in reg-tests-1d.R.? The final lines of reg-tests-1d.Rout.fail are:
tools::Rd2txt(rd, out <- textConnection(NULL, "w"), fragment = TRUE) stopifnot(any(as.character(rd) != "\n"),
+?????????? identical(textConnectionValue(out)[2L], "LaTeX")); close(out)
## empty output in R <= 4.2.x
Yes, known for a few days on the R-core list. I am in the middle of an OS upgrade on that machine and won't have time to do more than report until that (and all the re-building and re-checking) is complete.
## as.POSIXlt(<very large Date>)? gave integer overflow stopifnot(as.POSIXlt(.Date(2^31 + 10))$year == 5879680L)
Error: as.POSIXlt(.Date(2^31 + 10))$year == 5879680L is not TRUE Execution halted I should have reported this earlier, but somehow did not find the time to do so.? So I thought I mention it here. :) Cheers, ????Berwin
______________________________________________ R-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Brian D. Ripley, ripley at stats.ox.ac.uk Emeritus Professor of Applied Statistics, University of Oxford