Hello, An issue discussed before on this list server is difficulty using dbWriteTable from the RPostgreSQL/DBI package to insert rows when either: 1) the R client and PostgreSQL server are on different computers (or at least don't share a common filesystem), and thus bulk copy is unavailable. 2) the postgres user does not have read permission for the user's file (See https://stat.ethz.ch/pipermail/r-sig-db/2009q4/000741.html, for example) To address this, I wrote a helper function called dbInsert() that some of you may find useful. The function converts a data frame into a single INSERT query transaction that inserts the data frame into an existing table. My hope is that this is a relatively fast way to insert rows when bulk copy is unavailable. If any of you have already discovered a faster way to insert data without using bulk copy, or find ways of improving this code, please share. I'm also CC:ing the RPostgreSQL package maintainer in case he thinks this functionality should be merged into the postgresWriteTable function for when bulk copy is not possible. Cheers, Robert dbInsert <- function(con, name, value, row.names = TRUE, ...) { xx <- dbSendQuery(con, paste("select * from", name, "LIMIT 1;")) cols <- dbColumnInfo(xx)$name dbClearResult(xx) if (row.names) { if (!"row_names" %in% cols) stop("row_names column missing from ", sQuote(name)) value[["row_names", exact=TRUE]] <- rownames(value) } if (length(setdiff(names(value), cols))) stop("names of 'value' do not match columns of ", sQuote(name)) cdt <- which(sapply(value, inherits, c("Date", "POSIXt"))) ctxt <- which(sapply(value, postgresqlDataType)=="text") for (i in cdt) value[[i]] <- ifelse(is.na(value[[i]]), "NULL", sQuote(format(value[[i]]))) for (i in setdiff(ctxt, cdt)) value[[i]] <- ifelse(is.na(value[[i]]), "NULL", sQuote(value[[i]])) m <- as.matrix(value) class(m) <- "character" m[is.na(m)] <- "NULL" q1 <- paste("BEGIN; INSERT INTO", name, "(", paste(names(value), collapse=", "), ") VALUES") q2 <- apply(m, 1, function(x) paste("(", paste(x, collapse=","), ")", sep="")) q3 <- "; COMMIT;" qry <- paste(q1, paste(q2, collapse=","), q3) dbGetQuery(con, qry) } Robert McGehee, CFA Geode Capital Management, LLC One Post Office Square, 28th Floor | Boston, MA | 02109 Tel: 617/392-8396 Fax:617/476-6389 mailto:robert.mcgehee at geodecapital.com This e-mail, and any attachments hereto, are intended fo...{{dropped:12}}
RPostgreSQL Row Inserts on Remote Servers
4 messages · Whit Armstrong, Gabor Grothendieck, McGehee, Robert
you can use my driver. which will someday make it to cran. it uses a binary connection to write the data. http://github.com/armstrtw/unifieddbi depending on what you are doing, you can get between a 10x and 50x speedup. -Whit On Tue, Jul 20, 2010 at 11:37 AM, McGehee, Robert
<Robert.McGehee at geodecapital.com> wrote:
Hello, An issue discussed before on this list server is difficulty using dbWriteTable from the RPostgreSQL/DBI package to insert rows when either: 1) the R client and PostgreSQL server are on different computers (or at least don't share a common filesystem), and thus bulk copy is unavailable. 2) the postgres user does not have read permission for the user's file (See https://stat.ethz.ch/pipermail/r-sig-db/2009q4/000741.html, for example) To address this, I wrote a helper function called dbInsert() that some of you may find useful. The function converts a data frame into a single INSERT query transaction that inserts the data frame into an existing table. My hope is that this is a relatively fast way to insert rows when bulk copy is unavailable. If any of you have already discovered a faster way to insert data without using bulk copy, or find ways of improving this code, please share. I'm also CC:ing the RPostgreSQL package maintainer in case he thinks this functionality should be merged into the postgresWriteTable function for when bulk copy is not possible. Cheers, Robert dbInsert <- function(con, name, value, row.names = TRUE, ...) { ? ?xx <- dbSendQuery(con, paste("select * from", name, "LIMIT 1;")) ? ?cols <- dbColumnInfo(xx)$name ? ?dbClearResult(xx) ? ?if (row.names) { ? ? ? ?if (!"row_names" %in% cols) stop("row_names column missing from ", sQuote(name)) ? ? ? ?value[["row_names", exact=TRUE]] <- rownames(value) ? ?} ? ?if (length(setdiff(names(value), cols))) ? ? ?stop("names of 'value' do not match columns of ", sQuote(name)) ? ?cdt ?<- which(sapply(value, inherits, c("Date", "POSIXt"))) ? ?ctxt <- which(sapply(value, postgresqlDataType)=="text") ? ?for (i in cdt) ? ? ?value[[i]] <- ifelse(is.na(value[[i]]), "NULL", sQuote(format(value[[i]]))) ? ?for (i in setdiff(ctxt, cdt)) ? ? ?value[[i]] <- ifelse(is.na(value[[i]]), "NULL", sQuote(value[[i]])) ? ?m <- as.matrix(value) ? ?class(m) <- "character" ? ?m[is.na(m)] <- "NULL" ? ?q1 <- paste("BEGIN; INSERT INTO", name, "(", paste(names(value), collapse=", "), ") VALUES") ? ?q2 <- apply(m, 1, function(x) paste("(", paste(x, collapse=","), ")", sep="")) ? ?q3 <- "; COMMIT;" ? ?qry <- paste(q1, paste(q2, collapse=","), q3) ? ?dbGetQuery(con, qry) } Robert McGehee, CFA Geode Capital Management, LLC One Post Office Square, 28th Floor | Boston, MA | 02109 Tel: 617/392-8396 ? ?Fax:617/476-6389 mailto:robert.mcgehee at geodecapital.com This e-mail, and any attachments hereto, are intended fo...{{dropped:12}}
_______________________________________________ R-sig-DB mailing list -- R Special Interest Group R-sig-DB at stat.math.ethz.ch https://stat.ethz.ch/mailman/listinfo/r-sig-db
On Tue, Jul 20, 2010 at 11:37 AM, McGehee, Robert
<Robert.McGehee at geodecapital.com> wrote:
Hello, An issue discussed before on this list server is difficulty using dbWriteTable from the RPostgreSQL/DBI package to insert rows when either: 1) the R client and PostgreSQL server are on different computers (or at least don't share a common filesystem), and thus bulk copy is unavailable. 2) the postgres user does not have read permission for the user's file (See https://stat.ethz.ch/pipermail/r-sig-db/2009q4/000741.html, for example) To address this, I wrote a helper function called dbInsert() that some of you may find useful. The function converts a data frame into a single INSERT query transaction that inserts the data frame into an existing table. My hope is that this is a relatively fast way to insert rows when bulk copy is unavailable. If any of you have already discovered a faster way to insert data without using bulk copy, or find ways of improving this code, please share. I'm also CC:ing the RPostgreSQL package maintainer in case he thinks this functionality should be merged into the postgresWriteTable function for when bulk copy is not possible.
I haven't tested the speed relative to other drivers but I have a PostgreSQL driver on CRAN, RpgSQL which supports PostgreSQL via a layer over RJDBC. It was mainly developed for use with the sqldf package (which now supports sqlite, H2 and PostgreSQL databases) since it needed functionality not supported by other CRAN drivers.
Thanks to Gabor and Whit for replies. I thought I'd give these packages a try, starting with RpgSQL on my Mac. However, I'm running into installation troubles. If any of you are feeling generous, and the below RpgSQL error message means something to you, please let me know what I'm doing wrong. Here are the RpgSQL installation steps I tried: First I downloaded the latest JDBC4 driver and saved it here on my Mac: /Library/Java/Extensions/postgresql-8.4-701.jdbc4.jar And then per installation instructions I exported the environment variable RpgSQL_JAR to point to the above file. Next, I successfully tested the JDBC installation by compiling and running the quick JDBC program found here: http://www.fankhausers.com/postgresql/jdbc/ to show that I could create, drop, and populate a table using JDBC. Last, I successfully installed the latest RpgSQL and all dependencies (I didn't specify any config options here). However, I can't seem to connect:
p <- dbDriver("pgSQL")
Error in pgSQL() : Could not find Postgres JDBC driver on/Library/Java/Extensions/postgresql-8.4-701.jdbc4.jar./Library/Framewo rks/Python.framework/Versions/2.6/bin:/Users/a347549/Bin:/Users/a347549/ bin:/opt/bin:/usr/local/bin:/usr/local/pgsql/bin:/usr/bin:/bin:/usr/sbin :/sbin:/usr/local/bin:/usr/texbin:/usr/X11/bin/usr/local/pgsql/share/jav a The error message indicates to me that the JDBC driver is not where I put it--but it is. The file is also world readable/executable, so there shouldn't be any permission problems. Any ideas? Note I already have RPostgreSQL installed and working, so it's (presumably) not a problem with PostgreSQL. Thanks, Robert Some possibly important information:
java -version
java version "1.6.0_20" Java(TM) SE Runtime Environment (build 1.6.0_20-b02-279-9M3165) Java HotSpot(TM) 64-Bit Server VM (build 16.3-b01-279, mixed mode)
uname -a
Darwin BOSA871443.fmr.com 9.8.0 Darwin Kernel Version 9.8.0: Wed Jul 15 16:55:01 PDT 2009; root:xnu-1228.15.4~1/RELEASE_I386 i386
pg_config
BINDIR = /usr/local/pgsql/bin DOCDIR = /usr/local/pgsql/share/doc HTMLDIR = /usr/local/pgsql/share/doc INCLUDEDIR = /usr/local/pgsql/include PKGINCLUDEDIR = /usr/local/pgsql/include INCLUDEDIR-SERVER = /usr/local/pgsql/include/server LIBDIR = /usr/local/pgsql/lib PKGLIBDIR = /usr/local/pgsql/lib LOCALEDIR = /usr/local/pgsql/share/locale MANDIR = /usr/local/pgsql/share/man SHAREDIR = /usr/local/pgsql/share SYSCONFDIR = /usr/local/pgsql/etc PGXS = /usr/local/pgsql/lib/pgxs/src/makefiles/pgxs.mk CONFIGURE = '--enable-thread-safety' '--with-bonjour' 'CFLAGS=-m64 -arch x86_64' CC = gcc -no-cpp-precomp CPPFLAGS = CFLAGS = -m64 -arch x86_64 -Wall -Wmissing-prototypes -Wpointer-arith -Wdeclaration-after-statement -Wendif-labels -fno-strict-aliasing -fwrapv CFLAGS_SL = LDFLAGS = -Wl,-dead_strip_dylibs LDFLAGS_SL = LIBS = -lpgport -lz -lreadline -lm VERSION = PostgreSQL 8.4.2
R.version
_ platform x86_64-apple-darwin9.8.0 arch x86_64 os darwin9.8.0 system x86_64, darwin9.8.0 status Patched major 2 minor 11.0 year 2010 month 05 day 11 svn rev 51984 language R version.string R version 2.11.0 Patched (2010-05-11 r51984) -----Original Message----- From: Gabor Grothendieck [mailto:ggrothendieck at gmail.com] Sent: Tuesday, July 20, 2010 12:16 PM To: McGehee, Robert Cc: r-sig-db at stat.math.ethz.ch Subject: Re: [R-sig-DB] RPostgreSQL Row Inserts on Remote Servers On Tue, Jul 20, 2010 at 11:37 AM, McGehee, Robert
<Robert.McGehee at geodecapital.com> wrote:
Hello, An issue discussed before on this list server is difficulty using dbWriteTable from the RPostgreSQL/DBI package to insert rows when either: 1) the R client and PostgreSQL server are on different computers (or
at
least don't share a common filesystem), and thus bulk copy is unavailable. 2) the postgres user does not have read permission for the user's file (See https://stat.ethz.ch/pipermail/r-sig-db/2009q4/000741.html, for example) To address this, I wrote a helper function called dbInsert() that some of you may find useful. The function converts a data frame into a
single
INSERT query transaction that inserts the data frame into an existing table. My hope is that this is a relatively fast way to insert rows
when
bulk copy is unavailable. If any of you have already discovered a faster way to insert data without using bulk copy, or find ways of improving this code, please share. I'm also CC:ing the RPostgreSQL package maintainer in case he thinks this functionality should be merged into the postgresWriteTable function for when bulk copy is not possible.
I haven't tested the speed relative to other drivers but I have a PostgreSQL driver on CRAN, RpgSQL which supports PostgreSQL via a layer over RJDBC. It was mainly developed for use with the sqldf package (which now supports sqlite, H2 and PostgreSQL databases) since it needed functionality not supported by other CRAN drivers.