Skip to content

A potential POSIXlt->Date bug introduced in r-devel

8 messages · Davis Vaughan, Berwin A Turlach, Brian Ripley +1 more

#
Hi all,

I think I have discovered a bug in the conversion from POSIXlt to Date that
has been introduced in r-devel.

It affects lubridate, but surprisingly didn't cause test failures there.
Instead it caused test failures in users of lubridate, like slider, arrow,
and admiral (see https://github.com/tidyverse/lubridate/issues/1069), and
at least in slider I have been asked by CRAN to correct this issue before
2022-10-16.

In r-devel we get the following:

```
data <- list(
  sec = 0,
  min = 0L,
  hour = 0L,
  mday = 31L,
  mon = c(0L, NA, 2L),
  year = 113L,
  wday = 4L,
  yday = 30L,
  isdst = 0L
)

x <- .POSIXlt(xx = data, tz = "UTC")
x
#> [1] "2013-01-31 UTC" NA               "2013-03-31 UTC"

# Looks right
as.POSIXct(x)
#> [1] "2013-01-31 UTC" NA               "2013-03-31 UTC"

# Weird, where is the `NA`?
as.Date(x)
#> [1] "2013-01-31" "1970-01-01" "2013-03-31"
```

The POSIXlt object is length 3, but is only partially filled out. The other
elements are all recycled to length 3 upon conversion to POSIXct or Date.
But when converting to Date, we lose the `NA` value. I think the
`as.Date()` conversion seems inconsistent with the `as.POSIXct()`
conversion.

It looks like this comes up because the conversion to Date now defaults to
using `sec` if any of the date-like fields are `NA_INTEGER`, but this means
the `NA` in the `mon` field is ignored.
https://github.com/wch/r-source/blob/e10a971dee6a0ab851279c183cc21954d66b3be4/src/main/datetime.c#L1293-L1295

Thanks all,
Davis Vaughan
#
> Hi all,

    > I think I have discovered a bug in the conversion from POSIXlt to Date that
    > has been introduced in r-devel.

    > It affects lubridate, but surprisingly didn't cause test failures there.
    > Instead it caused test failures in users of lubridate, like slider, arrow,
    > and admiral (see https://github.com/tidyverse/lubridate/issues/1069), and
    > at least in slider I have been asked by CRAN to correct this issue before
    > 2022-10-16.

    > In r-devel we get the following:

    > ```
    > data <- list(
    > sec = 0,
    > min = 0L,
    > hour = 0L,
    > mday = 31L,
    > mon = c(0L, NA, 2L),
    > year = 113L,
    > wday = 4L,
    > yday = 30L,
    > isdst = 0L
    > )

    > x <- .POSIXlt(xx = data, tz = "UTC")
    > x
    > #> [1] "2013-01-31 UTC" NA               "2013-03-31 UTC"

    > # Looks right
    > as.POSIXct(x)
    > #> [1] "2013-01-31 UTC" NA               "2013-03-31 UTC"



    > # Weird, where is the `NA`?
    > as.Date(x)
    > #> [1] "2013-01-31" "1970-01-01" "2013-03-31"
    > ```

I agree that the above is wrong, i.e., a bug in current  R-devel.

    > The POSIXlt object is length 3, but is only partially filled out. 

    > The other elements are all recycled to length 3 upon
    > conversion to POSIXct or Date. 

    > But when converting to Date, we lose the `NA` value. I think the
    > `as.Date()` conversion seems inconsistent with the `as.POSIXct()`
    > conversion.

Yes.  There was another very much relatd conversation here on R-devel,
initiated by Suharto Anggono just a few days ago.

This subject, i.e., "partially filled out" POSIXlt objects, was
one of the topics, too.

See my reply there, notably at the end:

  https://stat.ethz.ch/pipermail/r-devel/2022-October/082072.html
    
I do mention that "recycling" of partially filled POSIXlt
objects has only partially been implemented in R more generally
and was actually asking for comments and further discussion.


    > It looks like this comes up because the conversion to Date now defaults to
    > using `sec` if any of the date-like fields are `NA_INTEGER`,

yes, because only that allows to also deal with +/- Inf  etc,
as was recently added as new feature, see the NEWS of R 4.2.0

    ? Not strictly fixing a bug, format()ing and print()ing of
      non-finite Date and POSIXt values NaN and +/-Inf no longer show
      as NA but the respective string, e.g., Inf, for consistency with
      numeric vector's behaviour, fulfilling the wish of PR#18308.

i.e., see also R's bugzilla
      https://bugs.r-project.org/show_bug.cgi?id=18308

which actually *also* mentioned an NA problem in Date/Time objects.



    >  but this means  the `NA` in the `mon` field is ignored.

which I agree is bogous and we'll fix.

Still, I did not get any feedback on asking about documentation
etc on  POSIXlt objects ... and I *had* mentioned I agreed that
the current partial implementation of  "partially filled" i.e. recycling of
POSIXlt components should probably be made part of the
"definition" of POSIXlt.

Have I overlooked an existing definition / contract about these?

Martin

--
Martin M?chler
ETH Zurich   and  R Core team
#
G'day all,

On Thu, 6 Oct 2022 10:15:29 +0200
Martin Maechler <maechler at stat.math.ethz.ch> wrote:

            
I have no intention of hijacking this thread, but I wonder whether this
is a good opportunity to mention that the 32 bit build of R-devel falls
over on my machine since 25 September.  It fails one of the regression
tests in reg-tests-1d.R.  The final lines of reg-tests-1d.Rout.fail
are:
+           identical(textConnectionValue(out)[2L], "LaTeX"));
close(out)
Error: as.POSIXlt(.Date(2^31 + 10))$year == 5879680L is not TRUE
Execution halted


I should have reported this earlier, but somehow did not find the time
to do so.  So I thought I mention it here. :)

Cheers,

	Berwin
#
On 06/10/2022 09:41, Berwin A Turlach wrote:
Yes, known for a few days on the R-core list. I am in the middle of an 
OS upgrade on that machine and won't have time to do more than report 
until that (and all the re-building and re-checking) is complete.

  
    
1 day later
#
>> Hi all,

    >> I think I have discovered a bug in the conversion from POSIXlt to Date that
    >> has been introduced in r-devel.

    >> It affects lubridate, but surprisingly didn't cause test failures there.
    >> Instead it caused test failures in users of lubridate, like slider, arrow,
    >> and admiral (see https://github.com/tidyverse/lubridate/issues/1069), and
    >> at least in slider I have been asked by CRAN to correct this issue before
    >> 2022-10-16.

    >> In r-devel we get the following:

    >> ```
    >> data <- list(
    >> sec = 0,
    >> min = 0L,
    >> hour = 0L,
    >> mday = 31L,
    >> mon = c(0L, NA, 2L),
    >> year = 113L,
    >> wday = 4L,
    >> yday = 30L,
    >> isdst = 0L
    >> )

    >> x <- .POSIXlt(xx = data, tz = "UTC")
    >> x
    >> #> [1] "2013-01-31 UTC" NA               "2013-03-31 UTC"

    >> # Looks right
    >> as.POSIXct(x)
    >> #> [1] "2013-01-31 UTC" NA               "2013-03-31 UTC"



    >> # Weird, where is the `NA`?
    >> as.Date(x)
    >> #> [1] "2013-01-31" "1970-01-01" "2013-03-31"
    >> ```

    > I agree that the above is wrong, i.e., a bug in current  R-devel.

    >> The POSIXlt object is length 3, but is only partially filled out. 

    >> The other elements are all recycled to length 3 upon
    >> conversion to POSIXct or Date. 

    >> But when converting to Date, we lose the `NA` value. I think the
    >> `as.Date()` conversion seems inconsistent with the `as.POSIXct()`
    >> conversion.

    > Yes.  There was another very much relatd conversation here on R-devel,
    > initiated by Suharto Anggono just a few days ago.

    > This subject, i.e., "partially filled out" POSIXlt objects, was
    > one of the topics, too.

    > See my reply there, notably at the end:

    > https://stat.ethz.ch/pipermail/r-devel/2022-October/082072.html
    
    > I do mention that "recycling" of partially filled POSIXlt
    > objects has only partially been implemented in R more generally
    > and was actually asking for comments and further discussion.


    >> It looks like this comes up because the conversion to Date now defaults to
    >> using `sec` if any of the date-like fields are `NA_INTEGER`,

    > yes, because only that allows to also deal with +/- Inf  etc,
    > as was recently added as new feature, see the NEWS of R 4.2.0

    > ? Not strictly fixing a bug, format()ing and print()ing of
    > non-finite Date and POSIXt values NaN and +/-Inf no longer show
    > as NA but the respective string, e.g., Inf, for consistency with
    > numeric vector's behaviour, fulfilling the wish of PR#18308.

    > i.e., see also R's bugzilla
    > https://bugs.r-project.org/show_bug.cgi?id=18308

    > which actually *also* mentioned an NA problem in Date/Time objects.


    >> but this means  the `NA` in the `mon` field is ignored.

    > which I agree is bogous and we'll fix.

    > Still, I did not get any feedback on asking about documentation
    > etc on  POSIXlt objects ... and I *had* mentioned I agreed that
    > the current partial implementation of  "partially filled" i.e. recycling of
    > POSIXlt components should probably be made part of the
    > "definition" of POSIXlt.

    > Have I overlooked an existing definition / contract about these?

    > Martin

I'm still waiting for comments.

Note that  "partially filled" POSIXlt do not work correctly in
any version of R.  I mentioned that even length(.) may easily
fail; but there is much more.

While I can relatively easily fix Davis' case above,
the following example behaves wrongly in current and previous
released versions of R and in R-devel:

dlt <- .POSIXlt(list(sec = c(-999, 10000 + c(1:10,-Inf, NA)) + pi,
                                        # "out of range", non-finite, fractions
                     min = 45L, hour = c(21L, 3L, NA, 4L),
                     mday = 6L, mon  = c(11L, NA, 3L),
                     year = 116L, wday = 2L, yday = 340L, isdst = 1L))


Of course that's constructed to be particularly unpleasant.
You can try some of the following "checks" in your version(s) of
R to see that some of the things are misbehaving with it in all (*)
versions of R.
--
*) so I claim boldly


dct <- as.POSIXct(dlt)
(n <- length(dct))
dD  <- as.Date(dlt)
dDc <- as.Date(dct)
dltN <- as.POSIXlt(dct) # "normalized POSIXlt" (with *lost* accuracy):
data.frame(unclass(dltN))
.POSIXltNormalize <- function(x) {
    stopifnot(is.numeric(s <- x$sec))
    x <- as.POSIXlt(as.POSIXct(x)) # and restore the precise seconds :
    ifin <- is.finite(s) & is.finite(x$sec) # (maybe recycling already)
    x$sec[ifin] <- s[ifin] %% 60
    x
}
dlt2 <- .POSIXltNormalize(dlt) # normalized POSIXlt - with accuracy kept
all.equal(dlt2$sec, dltN$sec, tolerance = 0) # .. small (2e-9) difference
stopifnot(all.equal(dlt2, dltN),
          identical(as.POSIXct(dlt2), as.POSIXct(dltN)))
## First show (in a way it also works for older R), then check :
oldR <- getRversion() < "4.2.2"
print(width = 101,
data.frame(dlt, dltN, asCT = dct, asDateCT = dDc,
           asDate = if(oldR) rep_len(dD,  n) else dD ,
           na = is.na(dlt),
           fin = if(oldR) rep_len(NA, n) else is.finite(dlt))
)




--
Martin M?chler
ETH Zurich   and  R Core team
#
Martin,

FWIW, I scoured the docs using GitHub's new code search preview but can't
seem to find any reference to the fact that POSIXlt fields are internally
recycled (even though lubridate seems to have been relying on this for
quite some time).

-Davis

On Fri, Oct 7, 2022 at 8:52 AM Martin Maechler <maechler at stat.math.ethz.ch>
wrote:

  
  
3 days later
#
> Martin,
    > FWIW, I scoured the docs using GitHub's new code search preview but can't
    > seem to find any reference to the fact that POSIXlt fields are internally
    > recycled (even though lubridate seems to have been relying on this for
    > quite some time).

Thank you, Davis, for searching  and confirming that this has
never been documented.  And so,  in that sense there's no bug,
because only non-documented behaviour has changed.

Still, with svn rev 83062  I have finally committed a version of
the C code underlying  as.Date.POSIXlt()  which (I think/hope)
does deal correctly with such partially filled POSIXlt objects
(it does with your relatively benign example and with the more
 nasty one I had mentioned).

BTW,  at least length() has been working correctly for
these, for a bit less than a day now, too,  since svn rev 83056  .

These are still only some of several steps to correctly support
such objects.
Notably the `[` and  `[<-` methods will need a substantial
update, too (as you have noticed for the latter in your other
R-devel post on Oct 6, Subject "Bug with `[<-.POSIXlt` on specific OSes").

Note that I'd be quite grateful for well documented platform
dependent behavior.  In addition to a typical sessionInfo() this
would also need to report the current setting of a  "TZ" environment variable,
where I have noticed [on Linux, Fedora 36] that there may be
different behavior between TZ entirely undefined and a TZ="" setting
(which I intuitively assumed would work equivalently).


Best,
Martin


    > -Davis

    > On Fri, Oct 7, 2022 at 8:52 AM Martin Maechler <maechler at stat.math.ethz.ch>
> wrote:
>> >>>>> Martin Maechler
    >> >>>>>     on Thu, 6 Oct 2022 10:15:29 +0200 writes:
    >> 
    >> >>>>> Davis Vaughan
    >> >>>>>     on Wed, 5 Oct 2022 17:04:11 -0400 writes:
    >> 
    >> >> Hi all,
    >> 
    >> >> I think I have discovered a bug in the conversion from POSIXlt to
    >> Date that has been introduced in r-devel.
    >> 
    >> >> It affects lubridate, but surprisingly didn't cause test failures
    >> there.

   [..................]
1 day later
#
Confirmed on Fedora 36 which has a 32-bit time_t for an i686 compile.  I 
was a bit surprised that has not been changed, but gather Linux distros 
are preferring to drop ix86 than fix it.

There is a simple workaround, to configure R with 
--with-internal-tzcode, which always uses a 64-bit time_t.  Given that 
2038 is not that far away, avoiding 32-bit time_t is generally a very 
good idea (not just for people working with dates in 5881580!).

That test should not really be run on platforms with 32-bit time_t, but 
that is not currently known at R level.
On 06/10/2022 13:38, Prof Brian Ripley wrote: