Skip to content

Timestamp with time zone type conversion from PostgreSQL to R

3 messages · Koen Stegen, Gabor Grothendieck, Andrew Piskorski

#
Dear all,

The maintainer of the RPostgreSQL package suggested to post this on R-sig-DB,
because of Linus' Law.

My apologies for a long post, but I hope the explanation makes life easy on the
eyeballs.

While doing a dbReadTable on a PostgreSQL database, I have noticed that some
columns with type "timestamp with time zone" behave differently from what I
expect: in R there seems to be a time zone related offset in the data values.

I have reduced the problem to the following code:

    library("RPostgreSQL")
    Sys.setenv(TZ="UTC")        # Just to level the playing field, not crucial

    databaseConnection = dbConnect(dbDriver("PostgreSQL"), dbname="template1",
                                   user="postgres", password="*****")

    # The input: any day, midnight, in UTC
    query = "SELECT TIMESTAMP WITH TIME ZONE '1999-12-31 00:00:00 UTC'"
    result = dbGetQuery(databaseConnection, query)

    # The problem:
    strftime(result$timestamptz, usetz=TRUE)  # prints:"1999-12-31 01:00:00 UTC"
    as.numeric(result$timestamptz)            # prints: 946602000


The value of strftime is 1 am instead of midnight. The numeric value is 3600
seconds higher than I expect, so it is not *just* a display problem.

First, I have checked my expectations with the date command in a Bash shell:
  date --date='1999-12-31 00:00:00 UTC' +%s

Then I have checked my database with the psql command line client:
  SELECT EXTRACT(epoch FROM TIMESTAMP WITH TIME ZONE '1999-12-31 00:00:00 UTC');

Both show: 946598400, 3600 less than what R shows.

To verify that this is indeed the correct value, I have used the online tool
from epochconverter dot com, and the human readable version of this time is, as
expected, midnight:  Fri, 31 Dec 1999 00:00:00 GMT

This was all done on:

    OS: OpenSUSE 11.2
    R: R version 2.10.1 Patched (2010-03-07 r51225)
    DBI: 0.2-5
    RPostgreSQL: 0.1-6
    PostgreSQL: 8.4.2


I have repeated this procedure on a Windows XP machine. To get the RPostgreSQL
package working, I have manually copied some .dll files to the system32
directory, but I think this is not relevant.

The result on Windows may actually shed some light on what is going on:


Warning messages:
1: In strptime(xx, f <- "%Y-%m-%d %H:%M:%OS", tz = tz) :
  unknown timezone '%Y-%m-%d %H:%M:%S'
2: In structure(.Internal(as.POSIXct(x, tz)), class = c("POSIXt", "POSIXct"),  :
  unknown timezone '%Y-%m-%d %H:%M:%S'
3: In strptime(x, f, tz = tz) : unknown timezone '%Y-%m-%d %H:%M:%S'
4: In structure(.Internal(as.POSIXct(x, tz)), class = c("POSIXt", "POSIXct"),  :
  unknown timezone '%Y-%m-%d %H:%M:%S'


It is clear that '%Y-%m-%d %H:%M:%S' is a date format, and not a time zone.

My guess would be that this is simple copy-paste mistake in the code, unintended
recycling, or an off-by-one in a C char**.


I don't have sufficient expertise to test this against other databases (afaics
MySQL doesn't even have a data type that includes the time zone) so I don't know
whether this is a problem specific to RPostgreSQL, or it encompasses all DBI
implementations.

Could any of the RPostgreSQL developers please have a look at this, and pass it
on to DBI and/or R core devs if necessary?


Thanks in advance,
Koen
Royal Meteorological Institute of Belgium
14 days later
#
This works for me on Windows Vista with the development version of the
RpgSQL package's DBI/RJDBC driver.

The CRAN version of the RpgSQL driver gives problems here too and if
we use Sys.setenv(TZ = "GMT") or Sys.setenv(TZ = "UTC") we also have
problems even with the development version of the RpgSQL package's
driver.
[1] "1999-12-30 19:00:00 EST"
[1] 946598400
timestamptz
1  1999-12-31
version
1 PostgreSQL 8.4.2, compiled by Visual C++ build 1400, 32-bit
[1] "R version 2.10.1 Patched (2010-03-10 r51276)"
[1] "Windows Vista (build 6002) Service Pack 2"
On Tue, Mar 9, 2010 at 4:56 AM, Koen Stegen <koen.stegen at oma.be> wrote:
2 days later
#
On Tue, Mar 09, 2010 at 09:56:40AM +0100, Koen Stegen wrote:

            
Koen, I don't know if it's related to your problem or not, but R,
without any use of PostgreSQL at all, seems to sometimes have trouble
correctly formatting POSIX time values.  Note:

  > as.integer(Sys.time()) 
  [1] 1269544425 
  > format(Sys.time() ,'%Y-%m-%d %T %z') 
  [1] "2010-03-25 15:13:45 +0000" 
  > format(Sys.time() ,'%Y-%m-%d %T %Z') 
  [1] "2010-03-25 15:13:45 EDT" 

That's with:
  R 2.9.2 (Patched), 2009-09-24, svn.rev 50183, x86_64-unknown-linux-gnu

I am currently in the EDT (Eastern Daylight Time) timezone, and the
output of the "%Z" format above is correct.  However, the "%z" is
wrong, EDT is UTC -4, not +0.