I understand that `as.character.POSIXt()` had an overhaul in R 4.3 (https://github.com/wch/r-source/commit/f6fd993f8a2f799a56dbecbd8238f155191fc31b), and I have come across a new behaviour and I wonder if it is unintended? When calling `as.character.POSIXt()` on a vector that contains elements where the time component is midnight (00:00:00), it drops the time component of that element in the resulting character vector. Previously the time component was retained: In R 4.2.3: ``` R.version$version.string #> [1] "R version 4.2.3 (2023-03-15)" (t <- as.POSIXct(c("1975-01-01 00:00:00", "1975-01-01 15:27:00"))) #> [1] "1975-01-01 00:00:00 PST" "1975-01-01 15:27:00 PST? (tc <- as.character(t)) #> [1] "1975-01-01 00:00:00" "1975-01-01 15:27:00? ``` In R 4.3.1: ``` R.version$version.string #> [1] "R version 4.3.1 (2023-06-16)" (t <- as.POSIXct(c("1975-01-01 00:00:00", "1975-01-01 15:27:00"))) #> [1] "1975-01-01 00:00:00 PST" "1975-01-01 15:27:00 PST? (tc <- as.character(t)) #> [1] "1975-01-01" "1975-01-01 15:27:00? ``` This has consequences when round-tripping from POSIXt -> character -> POSIXt, since `as.POSIXct.character()` drops the time component from the entire vector if any element does not have a time component: In R 4.2.3: ``` R.version$version.string #> [1] "R version 4.2.3 (2023-03-15)" (t <- as.POSIXct(c("1975-01-01 00:00:00", "1975-01-01 15:27:00"))) #> [1] "1975-01-01 00:00:00 PST" "1975-01-01 15:27:00 PST? (tc <- as.character(t)) #> [1] "1975-01-01 00:00:00" "1975-01-01 15:27:00? as.POSIXct(tc) #> [1] "1975-01-01 00:00:00 PST" "1975-01-01 15:27:00 PST? ``` In R 4.3.1: ``` R.version$version.string #> [1] "R version 4.3.1 (2023-06-16)? (t <- as.POSIXct(c("1975-01-01 00:00:00", "1975-01-01 15:27:00"))) #> [1] "1975-01-01 00:00:00 PST" "1975-01-01 15:27:00 PST? (tc <- as.character(t)) #> [1] "1975-01-01" "1975-01-01 15:27:00? as.POSIXct(tc) #> [1] "1975-01-01 PST" "1975-01-01 PST? ``` `format.POSIXt()` retains its old behaviour in R 4.3: ``` R.version$version.string #> [1] "R version 4.2.3 (2023-03-15)" (t <- as.POSIXct(c("1975-01-01 00:00:00", "1975-01-01 15:27:00"))) #> [1] "1975-01-01 00:00:00 PST" "1975-01-01 15:27:00 PST? (tf <- format(t)) #> [1] "1975-01-01 00:00:00" "1975-01-01 15:27:00? as.POSIXct(tf) #> [1] "1975-01-01 00:00:00 PST" "1975-01-01 15:27:00 PST? ``` ``` R.version$version.string #> [1] "R version 4.3.1 (2023-06-16)" (t <- as.POSIXct(c("1975-01-01 00:00:00", "1975-01-01 15:27:00"))) #> [1] "1975-01-01 00:00:00 PST" "1975-01-01 15:27:00 PST? (tf <- format(t)) #> [1] "1975-01-01 00:00:00" "1975-01-01 15:27:00? as.POSIXct(tf) #> [1] "1975-01-01 00:00:00 PST" "1975-01-01 15:27:00 PST? ``` And finally, the behaviour of `as.POSIXct.character()` has not changed (it previously did, and still does, drop the time component from all elements when any element has no time): ```R.version$version.string #> [1] "R version 4.2.3 (2023-03-15)" as.POSIXct(c("1975-01-01", "1975-01-01 15:27:00")) #> [1] "1975-01-01 PST" "1975-01-01 PST? ``` ```R.version$version.string #> [1] "R version 4.3.1 (2023-06-16)" as.POSIXct(c("1975-01-01", "1975-01-01 15:27:00")) #> [1] "1975-01-01 PST" "1975-01-01 PST? ``` I don?t know if this is a bug/regression in `as.character.POSIXt()`, or intended behaviour. If it is intended, I think it would benefit from some more comprehensive documentation. Thanks very much, Andy Teucher
R 4.3: Change in behaviour of as.character.POSIXt for datetime values with midnight time
5 messages · Andy Teucher, Martin Maechler, Tim Taylor
2 days later
Andy Teucher
on Fri, 11 Aug 2023 16:07:36 -0700 writes:
> I understand that `as.character.POSIXt()` had an overhaul in R 4.3 (https://github.com/wch/r-source/commit/f6fd993f8a2f799a56dbecbd8238f155191fc31b), and I have come across a new behaviour and I wonder if it is unintended? Well, as the NEWS entry says (partly visible in the url above -- which only shows one part of the several changes for R 4.3) : ? as.character(<POSIXt>) now behaves more in line with the methods for atomic vectors such as numbers, and is no longer influenced by options(). Ditto for as.character(<Date>). The as.character() method gets arguments digits and OutDec with defaults _not_ depending on options(). Use of as.character(*, format = .) now warns. It was "inconsistent" to have as.character(.) basically use format(.) for these datatime objects. as.character(x) for basic R types such as numbers, strings, logicals,... fulfills the important property as.character(x)[j] === as.character(x[j]) whereas that is very much different for format() where indeed, the formatting of x[1] may quite a bit depend on the other x[j]'s values:
as.character(c(1, pi, pi/2^20))
[1] "1" "3.14159265358979" "2.99605622633914e-06"
format(c(1, pi, pi/2^20))
[1] "1.000000e+00" "3.141593e+00" "2.996056e-06"
format(c(1, pi))
[1] "1.000000" "3.141593"
format(c(1, 10))
[1] " 1" "10"
> When calling `as.character.POSIXt()` on a vector that contains elements where the time component is midnight (00:00:00), it drops the time component of that element in the resulting character vector. Previously the time component was retained:
> In R 4.2.3:
> ```
> R.version$version.string
> #> [1] "R version 4.2.3 (2023-03-15)"
> (t <- as.POSIXct(c("1975-01-01 00:00:00", "1975-01-01 15:27:00")))
> #> [1] "1975-01-01 00:00:00 PST" "1975-01-01 15:27:00 PST?
> (tc <- as.character(t))
> #> [1] "1975-01-01 00:00:00" "1975-01-01 15:27:00?
> ```
> In R 4.3.1:
> ```
> R.version$version.string
> #> [1] "R version 4.3.1 (2023-06-16)"
> (t <- as.POSIXct(c("1975-01-01 00:00:00", "1975-01-01 15:27:00")))
> #> [1] "1975-01-01 00:00:00 PST" "1975-01-01 15:27:00 PST?
> (tc <- as.character(t))
> #> [1] "1975-01-01" "1975-01-01 15:27:00?
> ```
You should have used format() here or at least should do so now.
> This has consequences when round-tripping from POSIXt ->
> character -> POSIXt,
Well, I'd argue that such a "round trip" is not a "good idea"
anyway, as there are quite a few platform (local timezone for
one) issues, and precision is lost, notably for POSIXlt which
may be more precise than you typically get, etc.
> since `as.POSIXct.character()` drops the time component from the entire vector if any element does not have a time component:
Well, there *is* no as.POSIXct.character() {but we understand what you mean}:
If you look at the help page you'd see that there's as.POSIXlt.character()
{which is called from as.POSIXct.default()}
with a 3rd argument 'format' and a 4th argument 'tryFormats'
{and a lot more information -- the whole topic is far from trivial}.
Now, indirectly you would want R to be "smart", i.e. the
as.POSIXlt.character() method "guess better" about what the
user wants. ...
... and I agree that is not an unreasonable expectation, e.g.,
for your example of wanting
c("1975-01-01", "1975-01-01 15:27:00")
to "work".
as.POSIXlt.character() is well documented to be trying all of
the `tryFormats` in order, until it finds one that works for all
vector components (or fail / use NA if none works);
and here it's only a format which drops the time that works for
all (i.e. both, in the example).
{ Even though its behavior is well documented,
one could even argue that by default you'd want a warning in
such a case where "so much" is lost.
I think however that introducing such a warning may trip too
much current code relying .. also, the extra *checking* maybe
somewhat costly .. (?) .... anyway that's an interesting side topic
}
Instead what you want here is for each string (element of the
character vector) to try the `tryFormats and using the best
available *individually* {smart R users ==> "think lapply(.)"} :
Currently, this would be "something like" unlist(lapply(x, as.POSIXlt))
well, and then you need to jump a hoop additionally.
If you want POSIXct, like this :
.POSIXct(unlist(lapply( * , as.POSIXct))))
For your example
ch <- c("1975-01-01", "1975-01-01 15:27:00")
str(.POSIXct(unlist(lapply(ch, as.POSIXct))))
POSIXct[1:2], format: "1975-01-01 00:00:00" "1975-01-01 15:27:00"
---
After all that, yes, I agree that we should consider making
this much easier. E.g., by adding an optional argument to
as.POSIXlt.character() say, `each` with default FALSE such
that as.POSIXlt(*, each=TRUE)
{and also as.POSIXct(*, each=TRUE) } would follow the above
strategy.
?
Martin
--
Martin Maechler
ETH Zurich and R Core tam
Martin, Thank you. Everything you have written is helpful and I admit I am likely guilty of using as.character() instead of format() in the past(). Ignoring the above though, one thing I?m still unclear on is the special handling of zero (or rather non-zero time) seconds in the method. Is the motivation that as.character() outputs the minimum necessary information? It is clearly a very deliberate choice but the reasoning is still going a little over my head. Best Tim
On 14 Aug 2023, at 09:52, Martin Maechler <maechler at stat.math.ethz.ch> wrote: ?
Andy Teucher on Fri, 11 Aug 2023 16:07:36 -0700 writes:
I understand that `as.character.POSIXt()` had an overhaul in R 4.3 (https://github.com/wch/r-source/commit/f6fd993f8a2f799a56dbecbd8238f155191fc31b), and I have come across a new behaviour and I wonder if it is unintended?
Well, as the NEWS entry says
(partly visible in the url above -- which only shows one part of
the several changes for R 4.3) :
? as.character(<POSIXt>) now behaves more in line with the methods
for atomic vectors such as numbers, and is no longer influenced
by options(). Ditto for as.character(<Date>). The
as.character() method gets arguments digits and OutDec with
defaults _not_ depending on options(). Use of as.character(*,
format = .) now warns.
It was "inconsistent" to have as.character(.) basically use format(.) for
these datatime objects.
as.character(x) for basic R types such as numbers, strings, logicals,...
fulfills the important property
as.character(x)[j] === as.character(x[j])
whereas that is very much different for format() where indeed,
the formatting of x[1] may quite a bit depend on the other
x[j]'s values:
as.character(c(1, pi, pi/2^20))
[1] "1" "3.14159265358979" "2.99605622633914e-06"
format(c(1, pi, pi/2^20))
[1] "1.000000e+00" "3.141593e+00" "2.996056e-06"
format(c(1, pi))
[1] "1.000000" "3.141593"
format(c(1, 10))
[1] " 1" "10"
When calling `as.character.POSIXt()` on a vector that contains elements where the time component is midnight (00:00:00), it drops the time component of that element in the resulting character vector. Previously the time component was retained:
In R 4.2.3:
``` R.version$version.string #> [1] "R version 4.2.3 (2023-03-15)"
(t <- as.POSIXct(c("1975-01-01 00:00:00", "1975-01-01 15:27:00")))
#> [1] "1975-01-01 00:00:00 PST" "1975-01-01 15:27:00 PST?
(tc <- as.character(t)) #> [1] "1975-01-01 00:00:00" "1975-01-01 15:27:00? ```
In R 4.3.1:
``` R.version$version.string #> [1] "R version 4.3.1 (2023-06-16)"
(t <- as.POSIXct(c("1975-01-01 00:00:00", "1975-01-01 15:27:00")))
#> [1] "1975-01-01 00:00:00 PST" "1975-01-01 15:27:00 PST?
(tc <- as.character(t)) #> [1] "1975-01-01" "1975-01-01 15:27:00? ```
You should have used format() here or at least should do so now.
This has consequences when round-tripping from POSIXt -> character -> POSIXt,
Well, I'd argue that such a "round trip" is not a "good idea" anyway, as there are quite a few platform (local timezone for one) issues, and precision is lost, notably for POSIXlt which may be more precise than you typically get, etc.
since `as.POSIXct.character()` drops the time component from the entire vector if any element does not have a time component:
Well, there *is* no as.POSIXct.character() {but we understand what you mean}:
If you look at the help page you'd see that there's as.POSIXlt.character()
{which is called from as.POSIXct.default()}
with a 3rd argument 'format' and a 4th argument 'tryFormats'
{and a lot more information -- the whole topic is far from trivial}.
Now, indirectly you would want R to be "smart", i.e. the
as.POSIXlt.character() method "guess better" about what the
user wants. ...
... and I agree that is not an unreasonable expectation, e.g.,
for your example of wanting
c("1975-01-01", "1975-01-01 15:27:00")
to "work".
as.POSIXlt.character() is well documented to be trying all of
the `tryFormats` in order, until it finds one that works for all
vector components (or fail / use NA if none works);
and here it's only a format which drops the time that works for
all (i.e. both, in the example).
{ Even though its behavior is well documented,
one could even argue that by default you'd want a warning in
such a case where "so much" is lost.
I think however that introducing such a warning may trip too
much current code relying .. also, the extra *checking* maybe
somewhat costly .. (?) .... anyway that's an interesting side topic
}
Instead what you want here is for each string (element of the
character vector) to try the `tryFormats and using the best
available *individually* {smart R users ==> "think lapply(.)"} :
Currently, this would be "something like" unlist(lapply(x, as.POSIXlt))
well, and then you need to jump a hoop additionally.
If you want POSIXct, like this :
.POSIXct(unlist(lapply( * , as.POSIXct))))
For your example
ch <- c("1975-01-01", "1975-01-01 15:27:00")
str(.POSIXct(unlist(lapply(ch, as.POSIXct))))
POSIXct[1:2], format: "1975-01-01 00:00:00" "1975-01-01 15:27:00"
---
After all that, yes, I agree that we should consider making
this much easier. E.g., by adding an optional argument to
as.POSIXlt.character() say, `each` with default FALSE such
that as.POSIXlt(*, each=TRUE)
{and also as.POSIXct(*, each=TRUE) } would follow the above
strategy.
?
Martin
--
Martin Maechler
ETH Zurich and R Core tam
______________________________________________ R-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Tim Taylor
on Mon, 14 Aug 2023 12:26:51 +0100 writes:
> Martin,
> Thank you. Everything you have written is helpful and I admit I am likely guilty of using as.character() instead of format() in the past().
> Ignoring the above though, one thing I?m still unclear on is the special handling of zero (or rather non-zero time) seconds in the method. Is the motivation that as.character() outputs the minimum necessary information? It is clearly a very deliberate choice but the reasoning is still going a little over my head.
> Best
> Tim
Hmm, I really don't understand what you don't understand.
Here's some annotated R code exemplifying that indeed now,
as.character(x)[j] === as.character(x[j])
but previously that was not fulfilled {when as.character() was
the same as format() for POSIXct or POSIXlt}:
##-----------------------------------------------------------------------------
x0 <- c("1975-01-01 00:00:00", "1975-01-01 15:27:00")
t0 <- as.POSIXct(x0)
str(t0) # POSIXct[1:2], format: "1975-01-01 00:00:00" "1975-01-01 15:27:00"
t0 # "1975-01-01 00:00:00 CET" "1975-01-01 15:27:00 CET"
t0[1] # "1975-01-01 CET" <-- yes, *no* 00:00:00 in no version of R
## In R <= 4.2.x as.character() was using format() for POSIX{ct,lt} :
as.character(t0) # "1975-01-01 00:00:00" "1975-01-01 15:27:00" << for R <= 4.2.x
as.character(t0) # "1975-01-01" "1975-01-01 15:27:00" << for R >= 4.3.0
as.character(t0[1]) # "1975-01-01" {in all versions of R}
Note that indeed as.character() does drop redundant trailing 0s :
> as.character(c(0.5, 0.75, pi))
[1] "0.5" "0.75" "3.14159265358979"
whereas format() does not (ensuring resulting strings of the same nchar(.)):
> format( c(0.5, 0.75, pi))
[1] "0.500000" "0.750000" "3.141593"
>> On 14 Aug 2023, at 09:52, Martin Maechler <maechler at stat.math.ethz.ch> wrote:
>>
>> ?
>>>
>>>>>>> Andy Teucher
>>>>>>> on Fri, 11 Aug 2023 16:07:36 -0700 writes:
>>
>>> I understand that `as.character.POSIXt()` had an overhaul in R 4.3 (https://github.com/wch/r-source/commit/f6fd993f8a2f799a56dbecbd8238f155191fc31b), and I have come across a new behaviour and I wonder if it is unintended?
>>
>> Well, as the NEWS entry says
>> (partly visible in the url above -- which only shows one part of
>> the several changes for R 4.3) :
>>
>> ? as.character(<POSIXt>) now behaves more in line with the methods
>> for atomic vectors such as numbers, and is no longer influenced
>> by options(). Ditto for as.character(<Date>). The
>> as.character() method gets arguments digits and OutDec with
>> defaults _not_ depending on options(). Use of as.character(*,
>> format = .) now warns.
>>
>> It was "inconsistent" to have as.character(.) basically use format(.) for
>> these datatime objects.
>> as.character(x) for basic R types such as numbers, strings, logicals,...
>> fulfills the important property
>>
>> as.character(x)[j] === as.character(x[j])
>>
>> whereas that is very much different for format() where indeed,
>> the formatting of x[1] may quite a bit depend on the other
>> x[j]'s values:
>>
>>> as.character(c(1, pi, pi/2^20))
>> [1] "1" "3.14159265358979" "2.99605622633914e-06"
>>
>>> format(c(1, pi, pi/2^20))
>> [1] "1.000000e+00" "3.141593e+00" "2.996056e-06"
>>> format(c(1, pi))
>> [1] "1.000000" "3.141593"
>>> format(c(1, 10))
>> [1] " 1" "10"
>>>
>>
>>
>>> When calling `as.character.POSIXt()` on a vector that contains elements where the time component is midnight (00:00:00), it drops the time component of that element in the resulting character vector. Previously the time component was retained:
>>
>>> In R 4.2.3:
>>
>>> ```
>>> R.version$version.string
>>> #> [1] "R version 4.2.3 (2023-03-15)"
>>
>>> (t <- as.POSIXct(c("1975-01-01 00:00:00", "1975-01-01 15:27:00")))
>>> #> [1] "1975-01-01 00:00:00 PST" "1975-01-01 15:27:00 PST?
>>
>>> (tc <- as.character(t))
>>> #> [1] "1975-01-01 00:00:00" "1975-01-01 15:27:00?
>>> ```
>>
>>> In R 4.3.1:
>>
>>> ```
>>> R.version$version.string
>>> #> [1] "R version 4.3.1 (2023-06-16)"
>>
>>> (t <- as.POSIXct(c("1975-01-01 00:00:00", "1975-01-01 15:27:00")))
>>> #> [1] "1975-01-01 00:00:00 PST" "1975-01-01 15:27:00 PST?
>>
>>> (tc <- as.character(t))
>>> #> [1] "1975-01-01" "1975-01-01 15:27:00?
>>> ```
>>
>> You should have used format() here or at least should do so now.
>>
>>> This has consequences when round-tripping from POSIXt ->
>>> character -> POSIXt,
>>
>> Well, I'd argue that such a "round trip" is not a "good idea"
>> anyway, as there are quite a few platform (local timezone for
>> one) issues, and precision is lost, notably for POSIXlt which
>> may be more precise than you typically get, etc.
>>
>>> since `as.POSIXct.character()` drops the time component from the entire vector if any element does not have a time component:
>>
>> Well, there *is* no as.POSIXct.character() {but we understand what you mean}:
>> If you look at the help page you'd see that there's as.POSIXlt.character()
>> {which is called from as.POSIXct.default()}
>> with a 3rd argument 'format' and a 4th argument 'tryFormats'
>> {and a lot more information -- the whole topic is far from trivial}.
>>
>> Now, indirectly you would want R to be "smart", i.e. the
>> as.POSIXlt.character() method "guess better" about what the
>> user wants. ...
>> ... and I agree that is not an unreasonable expectation, e.g.,
>> for your example of wanting
>>
>> c("1975-01-01", "1975-01-01 15:27:00")
>>
>> to "work".
>>
>> as.POSIXlt.character() is well documented to be trying all of
>> the `tryFormats` in order, until it finds one that works for all
>> vector components (or fail / use NA if none works);
>> and here it's only a format which drops the time that works for
>> all (i.e. both, in the example).
>>
>> { Even though its behavior is well documented,
>> one could even argue that by default you'd want a warning in
>> such a case where "so much" is lost.
>> I think however that introducing such a warning may trip too
>> much current code relying .. also, the extra *checking* maybe
>> somewhat costly .. (?) .... anyway that's an interesting side topic
>> }
>>
>> Instead what you want here is for each string (element of the
>> character vector) to try the `tryFormats and using the best
>> available *individually* {smart R users ==> "think lapply(.)"} :
>> Currently, this would be "something like" unlist(lapply(x, as.POSIXlt))
>> well, and then you need to jump a hoop additionally.
>> If you want POSIXct, like this :
>>
>> .POSIXct(unlist(lapply( * , as.POSIXct))))
>>
>> For your example
>>
>> ch <- c("1975-01-01", "1975-01-01 15:27:00")
>>
>>> str(.POSIXct(unlist(lapply(ch, as.POSIXct))))
>> POSIXct[1:2], format: "1975-01-01 00:00:00" "1975-01-01 15:27:00"
>>
>> ---
>>
>> After all that, yes, I agree that we should consider making
>> this much easier. E.g., by adding an optional argument to
>> as.POSIXlt.character() say, `each` with default FALSE such
>> that as.POSIXlt(*, each=TRUE)
>> {and also as.POSIXct(*, each=TRUE) } would follow the above
>> strategy.
>>
>> ?
>>
>> Martin
>>
>> --
>> Martin Maechler
>> ETH Zurich and R Core tam
>>
>> ______________________________________________
>> R-devel at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-devel
Many thanks Martin! I was completely overlooking the behaviour for a length 1 vector with 00:00:00. More coffee needed for me I think. Best Tim
On 15/08/2023 08:58, Martin Maechler wrote:
Tim Taylor
on Mon, 14 Aug 2023 12:26:51 +0100 writes:
> Martin,
> Thank you. Everything you have written is helpful and I admit I am likely guilty of using as.character() instead of format() in the past().
> Ignoring the above though, one thing I?m still unclear on is the special handling of zero (or rather non-zero time) seconds in the method. Is the motivation that as.character() outputs the minimum necessary information? It is clearly a very deliberate choice but the reasoning is still going a little over my head.
> Best
> Tim
Hmm, I really don't understand what you don't understand.
Here's some annotated R code exemplifying that indeed now,
as.character(x)[j] === as.character(x[j])
but previously that was not fulfilled {when as.character() was
the same as format() for POSIXct or POSIXlt}:
##-----------------------------------------------------------------------------
x0 <- c("1975-01-01 00:00:00", "1975-01-01 15:27:00")
t0 <- as.POSIXct(x0)
str(t0) # POSIXct[1:2], format: "1975-01-01 00:00:00" "1975-01-01 15:27:00"
t0 # "1975-01-01 00:00:00 CET" "1975-01-01 15:27:00 CET"
t0[1] # "1975-01-01 CET" <-- yes, *no* 00:00:00 in no version of R
## In R <= 4.2.x as.character() was using format() for POSIX{ct,lt} :
as.character(t0) # "1975-01-01 00:00:00" "1975-01-01 15:27:00" << for R <= 4.2.x
as.character(t0) # "1975-01-01" "1975-01-01 15:27:00" << for R >= 4.3.0
as.character(t0[1]) # "1975-01-01" {in all versions of R}
Note that indeed as.character() does drop redundant trailing 0s :
> as.character(c(0.5, 0.75, pi))
[1] "0.5" "0.75" "3.14159265358979" whereas format() does not (ensuring resulting strings of the same nchar(.)):
> format( c(0.5, 0.75, pi))
[1] "0.500000" "0.750000" "3.141593"
>> On 14 Aug 2023, at 09:52, Martin Maechler <maechler at stat.math.ethz.ch> wrote:
>>
>> ?
>>>
>>>>>>> Andy Teucher
>>>>>>> on Fri, 11 Aug 2023 16:07:36 -0700 writes:
>>
>>> I understand that `as.character.POSIXt()` had an overhaul in R 4.3 (https://github.com/wch/r-source/commit/f6fd993f8a2f799a56dbecbd8238f155191fc31b), and I have come across a new behaviour and I wonder if it is unintended?
>>
>> Well, as the NEWS entry says
>> (partly visible in the url above -- which only shows one part of
>> the several changes for R 4.3) :
>>
>> ? as.character(<POSIXt>) now behaves more in line with the methods
>> for atomic vectors such as numbers, and is no longer influenced
>> by options(). Ditto for as.character(<Date>). The
>> as.character() method gets arguments digits and OutDec with
>> defaults _not_ depending on options(). Use of as.character(*,
>> format = .) now warns.
>>
>> It was "inconsistent" to have as.character(.) basically use format(.) for
>> these datatime objects.
>> as.character(x) for basic R types such as numbers, strings, logicals,...
>> fulfills the important property
>>
>> as.character(x)[j] === as.character(x[j])
>>
>> whereas that is very much different for format() where indeed,
>> the formatting of x[1] may quite a bit depend on the other
>> x[j]'s values:
>>
>>> as.character(c(1, pi, pi/2^20))
>> [1] "1" "3.14159265358979" "2.99605622633914e-06"
>>
>>> format(c(1, pi, pi/2^20))
>> [1] "1.000000e+00" "3.141593e+00" "2.996056e-06"
>>> format(c(1, pi))
>> [1] "1.000000" "3.141593"
>>> format(c(1, 10))
>> [1] " 1" "10"
>>>
>>
>>
>>> When calling `as.character.POSIXt()` on a vector that contains elements where the time component is midnight (00:00:00), it drops the time component of that element in the resulting character vector. Previously the time component was retained:
>>
>>> In R 4.2.3:
>>
>>> ```
>>> R.version$version.string
>>> #> [1] "R version 4.2.3 (2023-03-15)"
>>
>>> (t <- as.POSIXct(c("1975-01-01 00:00:00", "1975-01-01 15:27:00")))
>>> #> [1] "1975-01-01 00:00:00 PST" "1975-01-01 15:27:00 PST?
>>
>>> (tc <- as.character(t))
>>> #> [1] "1975-01-01 00:00:00" "1975-01-01 15:27:00?
>>> ```
>>
>>> In R 4.3.1:
>>
>>> ```
>>> R.version$version.string
>>> #> [1] "R version 4.3.1 (2023-06-16)"
>>
>>> (t <- as.POSIXct(c("1975-01-01 00:00:00", "1975-01-01 15:27:00")))
>>> #> [1] "1975-01-01 00:00:00 PST" "1975-01-01 15:27:00 PST?
>>
>>> (tc <- as.character(t))
>>> #> [1] "1975-01-01" "1975-01-01 15:27:00?
>>> ```
>>
>> You should have used format() here or at least should do so now.
>>
>>> This has consequences when round-tripping from POSIXt ->
>>> character -> POSIXt,
>>
>> Well, I'd argue that such a "round trip" is not a "good idea"
>> anyway, as there are quite a few platform (local timezone for
>> one) issues, and precision is lost, notably for POSIXlt which
>> may be more precise than you typically get, etc.
>>
>>> since `as.POSIXct.character()` drops the time component from the entire vector if any element does not have a time component:
>>
>> Well, there *is* no as.POSIXct.character() {but we understand what you mean}:
>> If you look at the help page you'd see that there's as.POSIXlt.character()
>> {which is called from as.POSIXct.default()}
>> with a 3rd argument 'format' and a 4th argument 'tryFormats'
>> {and a lot more information -- the whole topic is far from trivial}.
>>
>> Now, indirectly you would want R to be "smart", i.e. the
>> as.POSIXlt.character() method "guess better" about what the
>> user wants. ...
>> ... and I agree that is not an unreasonable expectation, e.g.,
>> for your example of wanting
>>
>> c("1975-01-01", "1975-01-01 15:27:00")
>>
>> to "work".
>>
>> as.POSIXlt.character() is well documented to be trying all of
>> the `tryFormats` in order, until it finds one that works for all
>> vector components (or fail / use NA if none works);
>> and here it's only a format which drops the time that works for
>> all (i.e. both, in the example).
>>
>> { Even though its behavior is well documented,
>> one could even argue that by default you'd want a warning in
>> such a case where "so much" is lost.
>> I think however that introducing such a warning may trip too
>> much current code relying .. also, the extra *checking* maybe
>> somewhat costly .. (?) .... anyway that's an interesting side topic
>> }
>>
>> Instead what you want here is for each string (element of the
>> character vector) to try the `tryFormats and using the best
>> available *individually* {smart R users ==> "think lapply(.)"} :
>> Currently, this would be "something like" unlist(lapply(x, as.POSIXlt))
>> well, and then you need to jump a hoop additionally.
>> If you want POSIXct, like this :
>>
>> .POSIXct(unlist(lapply( * , as.POSIXct))))
>>
>> For your example
>>
>> ch <- c("1975-01-01", "1975-01-01 15:27:00")
>>
>>> str(.POSIXct(unlist(lapply(ch, as.POSIXct))))
>> POSIXct[1:2], format: "1975-01-01 00:00:00" "1975-01-01 15:27:00"
>>
>> ---
>>
>> After all that, yes, I agree that we should consider making
>> this much easier. E.g., by adding an optional argument to
>> as.POSIXlt.character() say, `each` with default FALSE such
>> that as.POSIXlt(*, each=TRUE)
>> {and also as.POSIXct(*, each=TRUE) } would follow the above
>> strategy.
>>
>> ?
>>
>> Martin
>>
>> --
>> Martin Maechler
>> ETH Zurich and R Core tam
>>
>> ______________________________________________
>> R-devel at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-devel