Skip to content

duplicated.data.frame() and POSIXct with DST shift

3 messages · Tobias Gauster, David Winsemius

#
Hi,

I encountered the behavior, that the duplicated method for data.frames gives "false positives" if there are columns of class POSIXct with a clock shift from DST to standard time. 

time <- as.POSIXct("2012-10-28 02:00", tz="Europe/Vienna") + c(0, 60*60)
time
[1] "2012-10-28 02:00:00 CEST" "2012-10-28 02:00:00 CET" 

df <- data.frame(time, text="foo")
duplicated(df)
[1] FALSE  TRUE

This is because the timezone is lost after calling paste():
do.call(paste, c(df, sep = "\r"))
[1] "2012-10-28 02:00:00\rfoo" "2012-10-28 02:00:00\rfoo"

I can't really figure out if this behavior is desired or not. If so, a short warning in ?duplicated could be helpful. It is mentioned how duplicated.data.frame() works, but I didn't find a hint to properly handle POSIXct-objects. 

My particular problem was to cast a data.frame like this one with cast() (which calls reshape1(), which calls duplicated()):

df2 <- data.frame(time, time1=as.numeric(time),
                  lab=rep(1:3, each=2), value=101:106, 
                  text=rep(c("foo", "bar"), each=3))

library(reshape2)

Using the column of class POSIXct as a variable in the formula gives:
cast(lab*time~text, data=df2, value="value")
Aggregation requires fun.aggregate: length used as default
  lab                time bar foo
1   1 2012-10-28 02:00:00   0   2
2   2 2012-10-28 02:00:00   1   1
3   3 2012-10-28 02:00:00   2   0

Converting to numeric, casting and converting back works as expected, although the timezone is not visible, because print.data.frame() calls format.POSIXct() with, usetz = FALSE:
y <- cast(lab*time1~text, data=df2, value="value")
y$time1 <- as.POSIXct("1970-01-01 01:00") + as.numeric(y$time1)

Can anyone suggest a more elegant solution? 

Best,
Tobias
#
On Dec 13, 2012, at 1:43 PM, Tobias Gauster wrote:

            
In this instance
I suspect the problem arise when 'paste' coerces to character:

 > as.character(time)
[1] "2012-10-28 02:00:00" "2012-10-28 02:00:00"

I think that as.character might get missed since the 'paste' operation  
is done internally.

 > as.character(time, usetz=TRUE)
[1] "2012-10-28 02:00:00 CEST" "2012-10-28 02:00:00 CET"
#
On Dec 13, 2012, at 5:01 PM, David Winsemius wrote:

            
This would work as intended if you pre-processed the argument to duplicated with:
time text
1 2012-10-28 02:00:00 CEST  foo
2  2012-10-28 02:00:00 CET  foo
[1] FALSE FALSE
David Winsemius
Alameda, CA, USA