Skip to content

RPostgreSQL Row Inserts on Remote Servers

4 messages · Whit Armstrong, Gabor Grothendieck, McGehee, Robert

#
Hello,
An issue discussed before on this list server is difficulty using
dbWriteTable from the RPostgreSQL/DBI package to insert rows when
either:
1) the R client and PostgreSQL server are on different computers (or at
least don't share a common filesystem), and thus bulk copy is
unavailable.
2) the postgres user does not have read permission for the user's file
(See https://stat.ethz.ch/pipermail/r-sig-db/2009q4/000741.html, for
example)

To address this, I wrote a helper function called dbInsert() that some
of you may find useful. The function converts a data frame into a single
INSERT query transaction that inserts the data frame into an existing
table. My hope is that this is a relatively fast way to insert rows when
bulk copy is unavailable.

If any of you have already discovered a faster way to insert data
without using bulk copy, or find ways of improving this code, please
share. I'm also CC:ing the RPostgreSQL package maintainer in case he
thinks this functionality should be merged into the postgresWriteTable
function for when bulk copy is not possible.

Cheers, 
Robert

dbInsert <- function(con, name, value, row.names = TRUE, ...) {
    xx <- dbSendQuery(con, paste("select * from", name, "LIMIT 1;"))
    cols <- dbColumnInfo(xx)$name
    dbClearResult(xx)
    if (row.names) {
        if (!"row_names" %in% cols) stop("row_names column missing from
", sQuote(name))
        value[["row_names", exact=TRUE]] <- rownames(value)
    }
    if (length(setdiff(names(value), cols)))
      stop("names of 'value' do not match columns of ", sQuote(name))
    
    cdt  <- which(sapply(value, inherits, c("Date", "POSIXt")))
    ctxt <- which(sapply(value, postgresqlDataType)=="text")
    for (i in cdt)
      value[[i]] <- ifelse(is.na(value[[i]]), "NULL",
sQuote(format(value[[i]])))
    for (i in setdiff(ctxt, cdt))
      value[[i]] <- ifelse(is.na(value[[i]]), "NULL",
sQuote(value[[i]]))

    m <- as.matrix(value)
    class(m) <- "character"
    m[is.na(m)] <- "NULL"

    q1 <- paste("BEGIN; INSERT INTO", name, "(", paste(names(value),
collapse=", "), ") VALUES")
    q2 <- apply(m, 1, function(x) paste("(", paste(x, collapse=","),
")", sep=""))
    q3 <- "; COMMIT;"
    qry <- paste(q1, paste(q2, collapse=","), q3)
    dbGetQuery(con, qry)
}

Robert McGehee, CFA
Geode Capital Management, LLC
One Post Office Square, 28th Floor | Boston, MA | 02109
Tel: 617/392-8396    Fax:617/476-6389
mailto:robert.mcgehee at geodecapital.com


This e-mail, and any attachments hereto, are intended fo...{{dropped:12}}
#
you can use my driver. which will someday make it to cran.  it uses a
binary connection to write the data.

http://github.com/armstrtw/unifieddbi

depending on what you are doing, you can get between a 10x and 50x speedup.

-Whit



On Tue, Jul 20, 2010 at 11:37 AM, McGehee, Robert
<Robert.McGehee at geodecapital.com> wrote:
#
On Tue, Jul 20, 2010 at 11:37 AM, McGehee, Robert
<Robert.McGehee at geodecapital.com> wrote:
I haven't tested the speed relative to other drivers but I have a
PostgreSQL driver on CRAN,
RpgSQL which supports PostgreSQL via a layer over RJDBC.

It was mainly developed for use with the sqldf package (which now
supports sqlite, H2 and
PostgreSQL databases) since it needed functionality not supported by
other CRAN drivers.
#
Thanks to Gabor and Whit for replies.

I thought I'd give these packages a try, starting with RpgSQL on my Mac.
However, I'm running into installation troubles. If any of you are
feeling generous, and the below RpgSQL error message means something to
you, please let me know what I'm doing wrong.

Here are the RpgSQL installation steps I tried:

First I downloaded the latest JDBC4 driver and saved it here on my Mac:
/Library/Java/Extensions/postgresql-8.4-701.jdbc4.jar

And then per installation instructions I exported the environment
variable RpgSQL_JAR to point to the above file.

Next, I successfully tested the JDBC installation by compiling and
running the quick JDBC program found here:
http://www.fankhausers.com/postgresql/jdbc/
to show that I could create, drop, and populate a table using JDBC.

Last, I successfully installed the latest RpgSQL and all dependencies (I
didn't specify any config options here). 

However, I can't seem to connect:
Error in pgSQL() :
  Could not find Postgres JDBC driver
on/Library/Java/Extensions/postgresql-8.4-701.jdbc4.jar./Library/Framewo
rks/Python.framework/Versions/2.6/bin:/Users/a347549/Bin:/Users/a347549/
bin:/opt/bin:/usr/local/bin:/usr/local/pgsql/bin:/usr/bin:/bin:/usr/sbin
:/sbin:/usr/local/bin:/usr/texbin:/usr/X11/bin/usr/local/pgsql/share/jav
a

The error message indicates to me that the JDBC driver is not where I
put it--but it is. The file is also world readable/executable, so there
shouldn't be any permission problems. Any ideas? Note I already have
RPostgreSQL installed and working, so it's (presumably) not a problem
with PostgreSQL.

Thanks, Robert

Some possibly important information:
java version "1.6.0_20"
Java(TM) SE Runtime Environment (build 1.6.0_20-b02-279-9M3165)
Java HotSpot(TM) 64-Bit Server VM (build 16.3-b01-279, mixed mode)
Darwin BOSA871443.fmr.com 9.8.0 Darwin Kernel Version 9.8.0: Wed Jul 15
16:55:01 PDT 2009; root:xnu-1228.15.4~1/RELEASE_I386 i386
BINDIR = /usr/local/pgsql/bin
DOCDIR = /usr/local/pgsql/share/doc
HTMLDIR = /usr/local/pgsql/share/doc
INCLUDEDIR = /usr/local/pgsql/include
PKGINCLUDEDIR = /usr/local/pgsql/include
INCLUDEDIR-SERVER = /usr/local/pgsql/include/server
LIBDIR = /usr/local/pgsql/lib
PKGLIBDIR = /usr/local/pgsql/lib
LOCALEDIR = /usr/local/pgsql/share/locale
MANDIR = /usr/local/pgsql/share/man
SHAREDIR = /usr/local/pgsql/share
SYSCONFDIR = /usr/local/pgsql/etc
PGXS = /usr/local/pgsql/lib/pgxs/src/makefiles/pgxs.mk
CONFIGURE = '--enable-thread-safety' '--with-bonjour' 'CFLAGS=-m64 -arch
x86_64'
CC = gcc -no-cpp-precomp
CPPFLAGS =
CFLAGS = -m64 -arch x86_64 -Wall -Wmissing-prototypes -Wpointer-arith
-Wdeclaration-after-statement -Wendif-labels -fno-strict-aliasing
-fwrapv
CFLAGS_SL =
LDFLAGS = -Wl,-dead_strip_dylibs
LDFLAGS_SL =
LIBS = -lpgport -lz -lreadline -lm
VERSION = PostgreSQL 8.4.2
_
platform       x86_64-apple-darwin9.8.0
arch           x86_64
os             darwin9.8.0
system         x86_64, darwin9.8.0
status         Patched
major          2
minor          11.0
year           2010
month          05
day            11
svn rev        51984
language       R
version.string R version 2.11.0 Patched (2010-05-11 r51984)

-----Original Message-----
From: Gabor Grothendieck [mailto:ggrothendieck at gmail.com] 
Sent: Tuesday, July 20, 2010 12:16 PM
To: McGehee, Robert
Cc: r-sig-db at stat.math.ethz.ch
Subject: Re: [R-sig-DB] RPostgreSQL Row Inserts on Remote Servers

On Tue, Jul 20, 2010 at 11:37 AM, McGehee, Robert
<Robert.McGehee at geodecapital.com> wrote:
at
single
when
I haven't tested the speed relative to other drivers but I have a
PostgreSQL driver on CRAN,
RpgSQL which supports PostgreSQL via a layer over RJDBC.

It was mainly developed for use with the sqldf package (which now
supports sqlite, H2 and
PostgreSQL databases) since it needed functionality not supported by
other CRAN drivers.