Is any database particularly better at "exchanging" data with R?
Background:
Sometime during the next 12-months, I plan on configuring a new computer system on which I will primarily run "R" and a SQL database (Microsoft SQL Server, MySQL, Oracle, etc). My primary goal is to "optimize" the system for R, and for passing data to and from R and the database.
I work with large datasets, and therefore I "think" one of my most important goals should be to maximize the amount of RAM that R can utilize effectively.
I am seeking advice concerning the database, version of R, OS, processor, hard-drive/storage configuration, etc. that I should consider. (I am guessing that I should build a system with lots of RAM, and a Linux OS, but am seeking advice from the R community.) If I choose Linux, does it matter which version I use? Any opinion regarding implementing a commercially supported version from a vendor such as Red Hat, Sun, etc? Is any database particularly better at "exchanging" data with R?
While cost is of course a consideration, it is probably a secondary consideration to overall performance, reliability, and ease of ongoing maintenance/support.
Thanks!
---------------------------------
Is any database particularly better at "exchanging" large datasets with R?
7 messages · Thomas Pujol, Paul Gilbert, th@ts@@iceh@tyouh@ve m@iii@g oii m@c@com +2 more
On Feb 6, 2008 2:10 PM, Thomas Pujol <thomas.pujol at yahoo.com> wrote:
Is any database particularly better at "exchanging" data with R? Background: Sometime during the next 12-months, I plan on configuring a new computer system on which I will primarily run "R" and a SQL database (Microsoft SQL Server, MySQL, Oracle, etc). My primary goal is to "optimize" the system for R, and for passing data to and from R and the database. I work with large datasets, and therefore I "think" one of my most important goals should be to maximize the amount of RAM that R can utilize effectively. I am seeking advice concerning the database, version of R, OS, processor, hard-drive/storage configuration, etc. that I should consider. (I am guessing that I should build a system with lots of RAM, and a Linux OS, but am seeking advice from the R community.) If I choose Linux, does it matter which version I use? Any opinion regarding implementing a commercially supported version from a vendor such as Red Hat, Sun, etc? Is any database particularly better at "exchanging" data with R? While cost is of course a consideration, it is probably a secondary consideration to overall performance, reliability, and ease of ongoing maintenance/support.
Hi, Thomas. As for database, you'll probably need to be more specific about what you want to do. Oracle, MySQL, and Postgresql (at least) have packages that support their use from R. Other databases can be configured to use RODBC. From the database point of view, Postgresql allows one to embed an R interpreter into the database. As for hardware requirements, that will depend on your application, so again, you will probably need to be more specific. Sean
Sean Davis wrote:
On Feb 6, 2008 2:10 PM, Thomas Pujol <thomas.pujol at yahoo.com> wrote:
Is any database particularly better at "exchanging" data with R? Background: Sometime during the next 12-months, I plan on configuring a new computer system on which I will primarily run "R" and a SQL database (Microsoft SQL Server, MySQL, Oracle, etc). My primary goal is to "optimize" the system for R, and for passing data to and from R and the database. I work with large datasets, and therefore I "think" one of my most important goals should be to maximize the amount of RAM that R can utilize effectively. I am seeking advice concerning the database, version of R, OS, processor, hard-drive/storage configuration, etc. that I should consider. (I am guessing that I should build a system with lots of RAM, and a Linux OS, but am seeking advice from the R community.) If I choose Linux, does it matter which version I use? Any opinion regarding implementing a commercially supported version from a vendor such as Red Hat, Sun, etc? Is any database particularly better at "exchanging" data with R? While cost is of course a consideration, it is probably a secondary consideration to overall performance, reliability, and ease of ongoing maintenance/support.
Hi, Thomas. As for database, you'll probably need to be more specific about what you want to do. Oracle, MySQL, and Postgresql (at least) have packages that support their use from R. Other databases can be configured to use RODBC.
Sean I thought the only maintained and working interface to Postgres was RODBC. Is there a package somewhere (other than the one for embedding R into Postgres)? Is there a package that uses DBI? Paul From the database point of view, Postgresql
allows one to embed an R interpreter into the database. As for hardware requirements, that will depend on your application, so again, you will probably need to be more specific. Sean
_______________________________________________ R-sig-DB mailing list -- R Special Interest Group R-sig-DB at stat.math.ethz.ch https://stat.ethz.ch/mailman/listinfo/r-sig-db
==================================================================================== La version fran?aise suit le texte anglais. ------------------------------------------------------------------------------------ This email may contain privileged and/or confidential information, and the Bank of Canada does not waive any related rights. Any distribution, use, or copying of this email or the information it contains by other than the intended recipient is unauthorized. If you received this email in error please delete it immediately from your system and notify the sender promptly by email that you have done so. ------------------------------------------------------------------------------------ Le pr?sent courriel peut contenir de l'information privil?gi?e ou confidentielle. La Banque du Canada ne renonce pas aux droits qui s'y rapportent. Toute diffusion, utilisation ou copie de ce courriel ou des renseignements qu'il contient par une personne autre que le ou les destinataires d?sign?s est interdite. Si vous recevez ce courriel par erreur, veuillez le supprimer imm?diatement et envoyer sans d?lai ? l'exp?diteur un message ?lectronique pour l'aviser que vous avez ?limin? de votre ordinateur toute copie du courriel re?u.
On Feb 6, 2008 4:49 PM, Paul Gilbert <pgilbert at bank-banque-canada.ca> wrote:
Sean Davis wrote:
On Feb 6, 2008 2:10 PM, Thomas Pujol <thomas.pujol at yahoo.com> wrote:
Is any database particularly better at "exchanging" data with R? Background: Sometime during the next 12-months, I plan on configuring a new computer system on which I will primarily run "R" and a SQL database (Microsoft SQL Server, MySQL, Oracle, etc). My primary goal is to "optimize" the system for R, and for passing data to and from R and the database. I work with large datasets, and therefore I "think" one of my most important goals should be to maximize the amount of RAM that R can utilize effectively. I am seeking advice concerning the database, version of R, OS, processor, hard-drive/storage configuration, etc. that I should consider. (I am guessing that I should build a system with lots of RAM, and a Linux OS, but am seeking advice from the R community.) If I choose Linux, does it matter which version I use? Any opinion regarding implementing a commercially supported version from a vendor such as Red Hat, Sun, etc? Is any database particularly better at "exchanging" data with R? While cost is of course a consideration, it is probably a secondary consideration to overall performance, reliability, and ease of ongoing maintenance/support.
Hi, Thomas. As for database, you'll probably need to be more specific about what you want to do. Oracle, MySQL, and Postgresql (at least) have packages that support their use from R. Other databases can be configured to use RODBC.
Sean I thought the only maintained and working interface to Postgres was RODBC. Is there a package somewhere (other than the one for embedding R into Postgres)? Is there a package that uses DBI?
There is RdbiPgSQL that is available from bioconductor. Parts of it are broken (the DBI parts), but connections, selecting, inserting, deleting, etc. are working last I checked. I hope there is some interest in resurrecting it as a full-fledged interface again. As an aside, I forgot to mention RSQLite. Sean
4 days later
On 6 Feb 2008, at 21:49, Paul Gilbert wrote:
I thought the only maintained and working interface to Postgres was RODBC. Is there a package somewhere (other than the one for embedding R into Postgres)? Is there a package that uses DBI?
There is - take a look here: http://www.bioconductor.org/packages/release/Software.html Install both "Rdbi" and "RdbiPgSQL". These work amazingly well for me. I put a line like this in my .Rprofile: my_db = dbConnect(psql, user="user_name", host="localhost", dbname="db_name") ...with the appropriate values substituted. Then I have defined: sqlQuery = function(query) { result = dbSendQuery(my_db, query) return(dbGetResult(result)) } This is then all in the background, so to perform a query, I use: data = sqlQuery("SELECT ....") and the result is a data frame. Cheers, Demitri
1 day later
thatsanicehatyouhave at mac.com wrote:
On 6 Feb 2008, at 21:49, Paul Gilbert wrote:
I thought the only maintained and working interface to Postgres was RODBC. Is there a package somewhere (other than the one for embedding R into Postgres)? Is there a package that uses DBI?
There is - take a look here: http://www.bioconductor.org/packages/release/Software.html Install both "Rdbi" and "RdbiPgSQL". These work amazingly well for me.
This one uses Rdbi, not DBI. AFAIK there is no DBI package for Postgres. Anybody willing to start one? Cheers, H.
On Feb 12, 2008 4:12 PM, Herve Pages <hpages at fhcrc.org> wrote:
thatsanicehatyouhave at mac.com wrote:
On 6 Feb 2008, at 21:49, Paul Gilbert wrote:
I thought the only maintained and working interface to Postgres was RODBC. Is there a package somewhere (other than the one for embedding R into Postgres)? Is there a package that uses DBI?
There is - take a look here: http://www.bioconductor.org/packages/release/Software.html Install both "Rdbi" and "RdbiPgSQL". These work amazingly well for me.
This one uses Rdbi, not DBI. AFAIK there is no DBI package for Postgres. Anybody willing to start one?
Good point, and I would love to see one, also. Sean