Dear all,
The maintainer of the RPostgreSQL package suggested to post this on R-sig-DB,
because of Linus' Law.
My apologies for a long post, but I hope the explanation makes life easy on the
eyeballs.
While doing a dbReadTable on a PostgreSQL database, I have noticed that some
columns with type "timestamp with time zone" behave differently from what I
expect: in R there seems to be a time zone related offset in the data values.
I have reduced the problem to the following code:
library("RPostgreSQL")
Sys.setenv(TZ="UTC") # Just to level the playing field, not crucial
databaseConnection = dbConnect(dbDriver("PostgreSQL"), dbname="template1",
user="postgres", password="*****")
# The input: any day, midnight, in UTC
query = "SELECT TIMESTAMP WITH TIME ZONE '1999-12-31 00:00:00 UTC'"
result = dbGetQuery(databaseConnection, query)
# The problem:
strftime(result$timestamptz, usetz=TRUE) # prints:"1999-12-31 01:00:00 UTC"
as.numeric(result$timestamptz) # prints: 946602000
The value of strftime is 1 am instead of midnight. The numeric value is 3600
seconds higher than I expect, so it is not *just* a display problem.
First, I have checked my expectations with the date command in a Bash shell:
date --date='1999-12-31 00:00:00 UTC' +%s
Then I have checked my database with the psql command line client:
SELECT EXTRACT(epoch FROM TIMESTAMP WITH TIME ZONE '1999-12-31 00:00:00 UTC');
Both show: 946598400, 3600 less than what R shows.
To verify that this is indeed the correct value, I have used the online tool
from epochconverter dot com, and the human readable version of this time is, as
expected, midnight: Fri, 31 Dec 1999 00:00:00 GMT
This was all done on:
OS: OpenSUSE 11.2
R: R version 2.10.1 Patched (2010-03-07 r51225)
DBI: 0.2-5
RPostgreSQL: 0.1-6
PostgreSQL: 8.4.2
I have repeated this procedure on a Windows XP machine. To get the RPostgreSQL
package working, I have manually copied some .dll files to the system32
directory, but I think this is not relevant.
The result on Windows may actually shed some light on what is going on:
Warning messages:
1: In strptime(xx, f <- "%Y-%m-%d %H:%M:%OS", tz = tz) :
unknown timezone '%Y-%m-%d %H:%M:%S'
2: In structure(.Internal(as.POSIXct(x, tz)), class = c("POSIXt", "POSIXct"), :
unknown timezone '%Y-%m-%d %H:%M:%S'
3: In strptime(x, f, tz = tz) : unknown timezone '%Y-%m-%d %H:%M:%S'
4: In structure(.Internal(as.POSIXct(x, tz)), class = c("POSIXt", "POSIXct"), :
unknown timezone '%Y-%m-%d %H:%M:%S'
It is clear that '%Y-%m-%d %H:%M:%S' is a date format, and not a time zone.
My guess would be that this is simple copy-paste mistake in the code, unintended
recycling, or an off-by-one in a C char**.
I don't have sufficient expertise to test this against other databases (afaics
MySQL doesn't even have a data type that includes the time zone) so I don't know
whether this is a problem specific to RPostgreSQL, or it encompasses all DBI
implementations.
Could any of the RPostgreSQL developers please have a look at this, and pass it
on to DBI and/or R core devs if necessary?
Thanks in advance,
Koen
Royal Meteorological Institute of Belgium
Timestamp with time zone type conversion from PostgreSQL to R
3 messages · Koen Stegen, Gabor Grothendieck, Andrew Piskorski
14 days later
This works for me on Windows Vista with the development version of the RpgSQL package's DBI/RJDBC driver. The CRAN version of the RpgSQL driver gives problems here too and if we use Sys.setenv(TZ = "GMT") or Sys.setenv(TZ = "UTC") we also have problems even with the development version of the RpgSQL package's driver.
library(RpgSQL)
Sys.setenv(TZ="") # NOTE
# may need to modify next statement based on your setup
databaseConnection = dbConnect(dbDriver("pgSQL"), dbname = "mydb")
# The input: any day, midnight, in UTC
query = "SELECT TIMESTAMP WITH TIME ZONE '1999-12-31 00:00:00 UTC'"
result = dbGetQuery(databaseConnection, query)
strftime(result$timestamptz, usetz=TRUE) # prints:"1999-12-31 01:00:00 UTC"
[1] "1999-12-30 19:00:00 EST"
as.numeric(result$timestamptz)
[1] 946598400
format(result, tz = "GMT")
timestamptz 1 1999-12-31
dbGetQuery(databaseConnection, "select version()")
version 1 PostgreSQL 8.4.2, compiled by Visual C++ build 1400, 32-bit
R.version.string
[1] "R version 2.10.1 Patched (2010-03-10 r51276)"
win.version()
[1] "Windows Vista (build 6002) Service Pack 2"
On Tue, Mar 9, 2010 at 4:56 AM, Koen Stegen <koen.stegen at oma.be> wrote:
Dear all,
The maintainer of the RPostgreSQL package suggested to post this on R-sig-DB,
because of Linus' Law.
My apologies for a long post, but I hope the explanation makes life easy on the
eyeballs.
While doing a dbReadTable on a PostgreSQL database, I have noticed that some
columns with type "timestamp with time zone" behave differently from what I
expect: in R there seems to be a time zone related offset in the data values.
I have reduced the problem to the following code:
? ?library("RPostgreSQL")
? ?Sys.setenv(TZ="UTC") ? ? ? ?# Just to level the playing field, not crucial
? ?databaseConnection = dbConnect(dbDriver("PostgreSQL"), dbname="template1",
? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? user="postgres", password="*****")
? ?# The input: any day, midnight, in UTC
? ?query = "SELECT TIMESTAMP WITH TIME ZONE '1999-12-31 00:00:00 UTC'"
? ?result = dbGetQuery(databaseConnection, query)
? ?# The problem:
? ?strftime(result$timestamptz, usetz=TRUE) ?# prints:"1999-12-31 01:00:00 UTC"
? ?as.numeric(result$timestamptz) ? ? ? ? ? ?# prints: 946602000
The value of strftime is 1 am instead of midnight. The numeric value is 3600
seconds higher than I expect, so it is not *just* a display problem.
First, I have checked my expectations with the date command in a Bash shell:
?date --date='1999-12-31 00:00:00 UTC' +%s
Then I have checked my database with the psql command line client:
?SELECT EXTRACT(epoch FROM TIMESTAMP WITH TIME ZONE '1999-12-31 00:00:00 UTC');
Both show: 946598400, 3600 less than what R shows.
To verify that this is indeed the correct value, I have used the online tool
from epochconverter dot com, and the human readable version of this time is, as
expected, midnight: ?Fri, 31 Dec 1999 00:00:00 GMT
This was all done on:
? ?OS: OpenSUSE 11.2
? ?R: R version 2.10.1 Patched (2010-03-07 r51225)
? ?DBI: 0.2-5
? ?RPostgreSQL: 0.1-6
? ?PostgreSQL: 8.4.2
I have repeated this procedure on a Windows XP machine. To get the RPostgreSQL
package working, I have manually copied some .dll files to the system32
directory, but I think this is not relevant.
The result on Windows may actually shed some light on what is going on:
Warning messages:
1: In strptime(xx, f <- "%Y-%m-%d %H:%M:%OS", tz = tz) :
?unknown timezone '%Y-%m-%d %H:%M:%S'
2: In structure(.Internal(as.POSIXct(x, tz)), class = c("POSIXt", "POSIXct"), ?:
?unknown timezone '%Y-%m-%d %H:%M:%S'
3: In strptime(x, f, tz = tz) : unknown timezone '%Y-%m-%d %H:%M:%S'
4: In structure(.Internal(as.POSIXct(x, tz)), class = c("POSIXt", "POSIXct"), ?:
?unknown timezone '%Y-%m-%d %H:%M:%S'
It is clear that '%Y-%m-%d %H:%M:%S' is a date format, and not a time zone.
My guess would be that this is simple copy-paste mistake in the code, unintended
recycling, or an off-by-one in a C char**.
I don't have sufficient expertise to test this against other databases (afaics
MySQL doesn't even have a data type that includes the time zone) so I don't know
whether this is a problem specific to RPostgreSQL, or it encompasses all DBI
implementations.
Could any of the RPostgreSQL developers please have a look at this, and pass it
on to DBI and/or R core devs if necessary?
Thanks in advance,
Koen
Royal Meteorological Institute of Belgium
_______________________________________________ R-sig-DB mailing list -- R Special Interest Group R-sig-DB at stat.math.ethz.ch https://stat.ethz.ch/mailman/listinfo/r-sig-db
2 days later
On Tue, Mar 09, 2010 at 09:56:40AM +0100, Koen Stegen wrote:
# The input: any day, midnight, in UTC
query = "SELECT TIMESTAMP WITH TIME ZONE '1999-12-31 00:00:00 UTC'"
result = dbGetQuery(databaseConnection, query)
# The problem:
strftime(result$timestamptz, usetz=TRUE) # prints:"1999-12-31 01:00:00 UTC"
as.numeric(result$timestamptz) # prints: 946602000
The value of strftime is 1 am instead of midnight. The numeric value is 3600
seconds higher than I expect, so it is not *just* a display problem.
Koen, I don't know if it's related to your problem or not, but R, without any use of PostgreSQL at all, seems to sometimes have trouble correctly formatting POSIX time values. Note: > as.integer(Sys.time()) [1] 1269544425 > format(Sys.time() ,'%Y-%m-%d %T %z') [1] "2010-03-25 15:13:45 +0000" > format(Sys.time() ,'%Y-%m-%d %T %Z') [1] "2010-03-25 15:13:45 EDT" That's with: R 2.9.2 (Patched), 2009-09-24, svn.rev 50183, x86_64-unknown-linux-gnu I am currently in the EDT (Eastern Daylight Time) timezone, and the output of the "%Z" format above is correct. However, the "%z" is wrong, EDT is UTC -4, not +0.
Andrew Piskorski <atp at piskorski.com> http://www.piskorski.com/