In trying to figure out how the best way to access a postgresql database from R on Mac OS X. Currently my database has 88 million rows (9 GB) and is growing so I am looking for a stable interface. The following options seem to be available: 1. RdbiPgSQL_1.18.1 2. RPostgreSQL_0.1-4 3. TSPostgreSQL_2009.3-2 These packages seemed to be obsolete and no longer supported/ recommended: 4. RPgSQL 1.0 5. Rdbi 0.1.2 Results ====== #1 relies on #5 and has an older connection syntax. #2 appears relatively new and has not reached 1.0 status yet. For someone used to libpq the interface is very familiar. #3 relies on #2 and not sure what it adds as the docs did not install correctly, nor are they accessible from CRAN. All 3 packages compiled and installed on Mac OS X even though #2 and #3 reported failures in CRAN. Have not tried to actually use them yet. Conclusion ========= Use #2 even though it does not seem very mature. Questions ======== Is this a good summary of the state of accessing postgresql from R or have I missed something? Is RPostgreSQL being used anyplace and is it stable? Neil Chicago
Postgresql Options for Mac OS X
8 messages · Paul Gilbert, Brian Ripley, Dirk Eddelbuettel +3 more
Neil Tiffin wrote:
In trying to figure out how the best way to access a postgresql database from R on Mac OS X. Currently my database has 88 million rows (9 GB) and is growing so I am looking for a stable interface. The following options seem to be available: 1. RdbiPgSQL_1.18.1 2. RPostgreSQL_0.1-4 3. TSPostgreSQL_2009.3-2 These packages seemed to be obsolete and no longer supported/recommended: 4. RPgSQL 1.0 5. Rdbi 0.1.2 Results ====== #1 relies on #5 and has an older connection syntax. #2 appears relatively new and has not reached 1.0 status yet. For someone used to libpq the interface is very familiar. #3 relies on #2 and not sure what it adds as the docs did not install correctly, nor are they accessible from CRAN. All 3 packages compiled and installed on Mac OS X even though #2 and #3 reported failures in CRAN. Have not tried to actually use them yet. Conclusion ========= Use #2 even though it does not seem very mature.
Yes. (#3 is specific to time series dbs. ) Another option is odbc. I'm not sure why you had problems with the docs. I think the testing problems on CRAN may have to do with the testing machine (not having a working Postgres?). Paul
Questions ======== Is this a good summary of the state of accessing postgresql from R or have I missed something? Is RPostgreSQL being used anyplace and is it stable? Neil Chicago
_______________________________________________ R-sig-DB mailing list -- R Special Interest Group R-sig-DB at stat.math.ethz.ch https://stat.ethz.ch/mailman/listinfo/r-sig-db
====================================================================================
La version fran?aise suit le texte anglais.
------------------------------------------------------------------------------------
This email may contain privileged and/or confidential in...{{dropped:26}}
On Mon, 11 May 2009, Neil Tiffin wrote:
In trying to figure out how the best way to access a postgresql database from R on Mac OS X. Currently my database has 88 million rows (9 GB) and is growing so I am looking for a stable interface. The following options seem to be available: 1. RdbiPgSQL_1.18.1 2. RPostgreSQL_0.1-4 3. TSPostgreSQL_2009.3-2
You have omitted RODBC, since PostgreSQL has a mature ODBC driver. 3) Is a special-purpose layer over 2).
These packages seemed to be obsolete and no longer supported/recommended: 4. RPgSQL 1.0 5. Rdbi 0.1.2 Results ====== #1 relies on #5 and has an older connection syntax. #2 appears relatively new and has not reached 1.0 status yet. For someone used to libpq the interface is very familiar. #3 relies on #2 and not sure what it adds as the docs did not install correctly, nor are they accessible from CRAN. All 3 packages compiled and installed on Mac OS X even though #2 and #3 reported failures in CRAN.
Have you looked at the failures? RPostgreSQL failed to install because the build machine does not have PostgreSQL installed.
Have not tried to actually use them yet. Conclusion ========= Use #2 even though it does not seem very mature. Questions ======== Is this a good summary of the state of accessing postgresql from R or have I missed something?
The second.
Is RPostgreSQL being used anyplace and is it stable? Neil Chicago
_______________________________________________ R-sig-DB mailing list -- R Special Interest Group R-sig-DB at stat.math.ethz.ch https://stat.ethz.ch/mailman/listinfo/r-sig-db
Brian D. Ripley, ripley at stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595
Thanks, I am looking into RODBC. So which approach is considered the most used?
On May 12, 2009, at 12:09 AM, Prof Brian Ripley wrote:
On Mon, 11 May 2009, Neil Tiffin wrote:
In trying to figure out how the best way to access a postgresql database from R on Mac OS X. Currently my database has 88 million rows (9 GB) and is growing so I am looking for a stable interface. The following options seem to be available: 1. RdbiPgSQL_1.18.1 2. RPostgreSQL_0.1-4 3. TSPostgreSQL_2009.3-2
You have omitted RODBC, since PostgreSQL has a mature ODBC driver. 3) Is a special-purpose layer over 2).
These packages seemed to be obsolete and no longer supported/ recommended: 4. RPgSQL 1.0 5. Rdbi 0.1.2 Results ====== #1 relies on #5 and has an older connection syntax. #2 appears relatively new and has not reached 1.0 status yet. For someone used to libpq the interface is very familiar. #3 relies on #2 and not sure what it adds as the docs did not install correctly, nor are they accessible from CRAN. All 3 packages compiled and installed on Mac OS X even though #2 and #3 reported failures in CRAN.
Have you looked at the failures? RPostgreSQL failed to install because the build machine does not have PostgreSQL installed.
Have not tried to actually use them yet. Conclusion ========= Use #2 even though it does not seem very mature. Questions ======== Is this a good summary of the state of accessing postgresql from R or have I missed something?
The second.
Is RPostgreSQL being used anyplace and is it stable? Neil Chicago
Neil,
On 12 May 2009 at 06:09, Prof Brian Ripley wrote:
| On Mon, 11 May 2009, Neil Tiffin wrote:
| | > In trying to figure out how the best way to access a postgresql database from | > R on Mac OS X. Currently my database has 88 million rows (9 GB) and is | > growing so I am looking for a stable interface. The following options seem | > to be available: | > | > 1. RdbiPgSQL_1.18.1 | > 2. RPostgreSQL_0.1-4 | > 3. TSPostgreSQL_2009.3-2 | | You have omitted RODBC, since PostgreSQL has a mature ODBC driver. Yes. I packaged it for Debian more than six years ago, and have supported it ever since, precisely because I wanted to talk to Pg on my systems. RODBC is stable and mature --- thanks to a lot of work by Brian Ripley (building on the earlier work by Michael Lapsley). That said, I also find ODBC and its setup mechanism with DSNs etc somewhat tedious and quite frankly like the DBI interface better. Which is why I suggested and mentored the DBI-compliant RPostgreSQL implementation as a Google Summer of Code project last year. People seem to use and like it, and you remove one layer of possible breakage. The DBI abstraction also allows you to switch to other backends should the need arise. So you have choices, which is a good thing. | Have you looked at the failures? RPostgreSQL failed to install | because the build machine does not have PostgreSQL installed. IIRC at least the CRAN Linux testing boxen (being Debian) have it. And Sameer and I also worked with Uwe last year to get Windows builds. So yes, the builds fails _on Mac OS X only_ --- presumably because nobody volunteered to provide Pg for the build machine. Maybe Neil can help here.... Dirk
Three out of two people have difficulties with fractions.
1 day later
Hi,
On Mon, May 11, 2009 at 6:18 PM, Neil Tiffin <neilt at neiltiffin.com> wrote:
In trying to figure out how the best way to access a postgresql database from R on Mac OS X. Currently my database has 88 million rows (9 GB) and is growing so I am looking for a stable interface. ?The following options seem to be available: 1. RdbiPgSQL_1.18.1 2. RPostgreSQL_0.1-4 3. TSPostgreSQL_2009.3-2 These packages seemed to be obsolete and no longer supported/recommended: 4. RPgSQL 1.0 5. Rdbi 0.1.2 Results ====== #1 relies on #5 and has an older connection syntax. #2 appears relatively new and has not reached 1.0 status yet. ?For someone used to libpq the interface is very familiar. #3 relies on #2 and not sure what it adds as the docs did not install correctly, nor are they accessible from CRAN. All 3 packages compiled and installed on Mac OS X even though #2 and #3 reported failures in CRAN. Have not tried to actually use them yet. Conclusion ========= Use #2 even though it does not seem very mature. Questions ======== Is this a good summary of the state of accessing postgresql from R or have I missed something? Is RPostgreSQL being used anyplace and is it stable?
Yes, I do use it on a regular basis. Though there are some quirks here and there with the functions, they are usable. Often I have found myself writing small wrapper functions around the ones provided with RPostgreSQL for my special needs. I have never tried the other options so no comments on them. But I remember, some time back, reading the documentations and then finally settling on RPostgreSQL for its clarity in implementation and function usage. I use it completely under Linux to Linux connections, so any Mac OSX specific issues will be hidden from me. Finally, I should also mention that my usage is primarily like this: extract relevant/subset of records (and fields) from the database and work inside R with that. I have never tried any complicated/fancy database coding, though, in principle it should work, Dirk might confirm.
Prasenjit
On May 14, 2009, at 12:13 AM, Prasenjit Kapat wrote:
Hi, On Mon, May 11, 2009 at 6:18 PM, Neil Tiffin <neilt at neiltiffin.com> wrote:
In trying to figure out how the best way to access a postgresql database from R on Mac OS X. Currently my database has 88 million rows (9 GB) and is growing so I am looking for a stable interface. The following options seem to be available: 1. RdbiPgSQL_1.18.1 2. RPostgreSQL_0.1-4 3. TSPostgreSQL_2009.3-2 These packages seemed to be obsolete and no longer supported/ recommended: 4. RPgSQL 1.0 5. Rdbi 0.1.2 Results ====== #1 relies on #5 and has an older connection syntax. #2 appears relatively new and has not reached 1.0 status yet. For someone used to libpq the interface is very familiar. #3 relies on #2 and not sure what it adds as the docs did not install correctly, nor are they accessible from CRAN. All 3 packages compiled and installed on Mac OS X even though #2 and #3 reported failures in CRAN. Have not tried to actually use them yet. Conclusion ========= Use #2 even though it does not seem very mature. Questions ======== Is this a good summary of the state of accessing postgresql from R or have I missed something? Is RPostgreSQL being used anyplace and is it stable?
Yes, I do use it on a regular basis. Though there are some quirks here and there with the functions, they are usable. Often I have found myself writing small wrapper functions around the ones provided with RPostgreSQL for my special needs. I have never tried the other options so no comments on them. But I remember, some time back, reading the documentations and then finally settling on RPostgreSQL for its clarity in implementation and function usage. I use it completely under Linux to Linux connections, so any Mac OSX specific issues will be hidden from me. Finally, I should also mention that my usage is primarily like this: extract relevant/subset of records (and fields) from the database and work inside R with that. I have never tried any complicated/fancy database coding, though, in principle it should work, Dirk might confirm. -- Prasenjit
If you care to be more specific about the "Quirks" and why you did your wrappers I would be interested. I am working with Dirk and others to get the test environments set up on CRAN and r-forge for Mac. Maybe some tests need to be added. Also I have a lot of experience with C drivers and probably can propose some improvements. Neil
On Thu, May 14, 2009 at 9:23 AM, Neil Tiffin <neilt at neiltiffin.com> wrote:
On May 14, 2009, at 12:13 AM, Prasenjit Kapat wrote: Hi,
On Mon, May 11, 2009 at 6:18 PM, Neil Tiffin <neilt at neiltiffin.com> wrote:
In trying to figure out how the best way to access a postgresql database from R on Mac OS X. Currently my database has 88 million rows (9 GB) and is growing so I am looking for a stable interface. The following options seem to be available: 1. RdbiPgSQL_1.18.1 2. RPostgreSQL_0.1-4 3. TSPostgreSQL_2009.3-2 These packages seemed to be obsolete and no longer supported/recommended: 4. RPgSQL 1.0 5. Rdbi 0.1.2 Results ====== #1 relies on #5 and has an older connection syntax. #2 appears relatively new and has not reached 1.0 status yet. For someone used to libpq the interface is very familiar. #3 relies on #2 and not sure what it adds as the docs did not install correctly, nor are they accessible from CRAN. All 3 packages compiled and installed on Mac OS X even though #2 and #3 reported failures in CRAN. Have not tried to actually use them yet. Conclusion ========= Use #2 even though it does not seem very mature. Questions ======== Is this a good summary of the state of accessing postgresql from R or have I missed something? Is RPostgreSQL being used anyplace and is it stable?
Yes, I do use it on a regular basis. Though there are some quirks here and there with the functions, they are usable. Often I have found myself writing small wrapper functions around the ones provided with RPostgreSQL for my special needs. I have never tried the other options so no comments on them. But I remember, some time back, reading the documentations and then finally settling on RPostgreSQL for its clarity in implementation and function usage. I use it completely under Linux to Linux connections, so any Mac OSX specific issues will be hidden from me. Finally, I should also mention that my usage is primarily like this: extract relevant/subset of records (and fields) from the database and work inside R with that. I have never tried any complicated/fancy database coding, though, in principle it should work, Dirk might confirm. -- Prasenjit
If you care to be more specific about the "Quirks" and why you did your wrappers I would be interested. I am working with Dirk and others to get the test environments set up on CRAN and r-forge for Mac. Maybe some tests need to be added. Also I have a lot of experience with C drivers and probably can propose some improvements.
I'm not using the interface much at this point, so I cannot comment much more than this without devoting more time. It seems fine for everyday casual use, but there there are a couple obvious issues that could use some work. See here for an issue with schema support (not sure if any suggested changes have been adopted): https://stat.ethz.ch/pipermail/r-sig-db/2009q2/000618.html The second had to do with dbWriteTable and dbReadTable. Postgresql has a protocol for slurping up text data via the connection in a VERY fast manner; this functionality (last I checked) was not being used to deal with dbWriteTable and dbReadTable. It would be very nice to use that api for the reading and (particularly) writing of tables, since it can be an order of magnitude or more faster than inserts. Sean