Skip to content

Is any database particularly better at "exchanging" large datasets with R?

7 messages · Thomas Pujol, Paul Gilbert, th@ts@@iceh@tyouh@ve m@iii@g oii m@c@com +2 more

#
Is any database particularly better at "exchanging" data with R?

Background: 
Sometime during the next 12-months, I plan on configuring a new computer system on which I will primarily run "R" and a SQL database (Microsoft SQL Server, MySQL, Oracle, etc).  My primary goal is to "optimize" the system for R, and for passing data to and from R and the database.

I work with large datasets, and therefore I "think" one of my most important goals should be to maximize the amount of RAM that R can utilize effectively.

I am seeking advice concerning the database, version of R, OS,  processor, hard-drive/storage configuration, etc. that I should consider. (I am guessing that I should build a system with lots of RAM, and a Linux OS, but am seeking advice from the R community.) If I choose Linux, does it matter which version I use? Any opinion regarding  implementing a commercially supported version from a vendor such as Red Hat, Sun, etc? Is any database particularly better at "exchanging" data with R?

While cost is of course a consideration, it is probably a secondary consideration to overall performance, reliability, and ease of ongoing maintenance/support.

Thanks!

       
---------------------------------
#
On Feb 6, 2008 2:10 PM, Thomas Pujol <thomas.pujol at yahoo.com> wrote:
Hi, Thomas.

As for database, you'll probably need to be more specific about what
you want to do.  Oracle, MySQL, and Postgresql (at least) have
packages that support their use from R.  Other databases can be
configured to use RODBC.  From the database point of view, Postgresql
allows one to embed an R interpreter into the database.  As for
hardware requirements, that will depend on your application, so again,
you will probably need to be more specific.

Sean
#
Sean Davis wrote:
Sean

I thought the only maintained and working interface to Postgres was 
RODBC.  Is there a package somewhere (other than the one for embedding R 
into Postgres)?  Is there a package that uses DBI?

Paul

 From the database point of view, Postgresql
====================================================================================

La version fran?aise suit le texte anglais.

------------------------------------------------------------------------------------

This email may contain privileged and/or confidential information, and the Bank of
Canada does not waive any related rights. Any distribution, use, or copying of this
email or the information it contains by other than the intended recipient is
unauthorized. If you received this email in error please delete it immediately from
your system and notify the sender promptly by email that you have done so. 

------------------------------------------------------------------------------------

Le pr?sent courriel peut contenir de l'information privil?gi?e ou confidentielle.
La Banque du Canada ne renonce pas aux droits qui s'y rapportent. Toute diffusion,
utilisation ou copie de ce courriel ou des renseignements qu'il contient par une
personne autre que le ou les destinataires d?sign?s est interdite. Si vous recevez
ce courriel par erreur, veuillez le supprimer imm?diatement et envoyer sans d?lai ?
l'exp?diteur un message ?lectronique pour l'aviser que vous avez ?limin? de votre
ordinateur toute copie du courriel re?u.
#
On Feb 6, 2008 4:49 PM, Paul Gilbert <pgilbert at bank-banque-canada.ca> wrote:
There is RdbiPgSQL that is available from bioconductor.  Parts of it
are broken (the DBI parts), but connections, selecting, inserting,
deleting, etc. are working last I checked.  I hope there is some
interest in resurrecting it as a full-fledged interface again.

As an aside, I forgot to mention RSQLite.

Sean
4 days later
#
On 6 Feb 2008, at 21:49, Paul Gilbert wrote:

            
There is - take a look here:

http://www.bioconductor.org/packages/release/Software.html

Install both "Rdbi" and "RdbiPgSQL". These work amazingly well for me.

I put a line like this in my .Rprofile:

my_db = dbConnect(psql, user="user_name", host="localhost",  
dbname="db_name")

...with the appropriate values substituted.

Then I have defined:

sqlQuery = function(query)
{
         result = dbSendQuery(my_db, query)
         return(dbGetResult(result))
}

This is then all in the background, so to perform a query, I use:

data = sqlQuery("SELECT ....")

and the result is a data frame.

Cheers,

Demitri
1 day later
#
thatsanicehatyouhave at mac.com wrote:
This one uses Rdbi, not DBI.

AFAIK there is no DBI package for Postgres. Anybody willing to start one?

Cheers,
H.
#
On Feb 12, 2008 4:12 PM, Herve Pages <hpages at fhcrc.org> wrote:
Good point, and I would love to see one, also.

Sean