database table merging tips with R
Just a small correction: start with s <- paste(r$userid,collapse=",") and not s <- paste(r$userid,sep=",")
--- On Fri, 12/9/08, Moshe Olshansky <m_olshansky at yahoo.com> wrote:
From: Moshe Olshansky <m_olshansky at yahoo.com>
Subject: Re: [R] database table merging tips with R
To: r-help at stat.math.ethz.ch, "Avram Aelony" <aavram at mac.com>
Received: Friday, 12 September, 2008, 8:59 AM
One possibility is as follows:
If r$userid is your array of (2000) ID's then
s <- paste(r$userid,sep=",")
s<- paste("select t.userid, x, y, z from largetable
t where t.serid in (",s,")",sep="")
and finally
d <- sqlQuery(connection,s)
Regards,
Moshe.
--- On Fri, 12/9/08, Avram Aelony <aavram at mac.com>
wrote:
From: Avram Aelony <aavram at mac.com> Subject: [R] database table merging tips with R To: r-help at stat.math.ethz.ch Received: Friday, 12 September, 2008, 4:33 AM Dear R list, What is the best way to efficiently marry an R dataset
with
a very large (Oracle) database table? The goal is to only return Oracle table rows that
match IDs
present in the R dataset. I have an R data frame with 2000 user IDs analogous
to: r =
data.frame(userid=round(runif(2000)*100000,0)) ...and I need to pull data from an Oracle table only
for
these 2000 IDs. The Oracle table is quite large. Additionally, the sql query may need to join to other
tables
to bring in ancillary fields.
I currently connect to Oracle via odbc:
library(RODBC)
connection <- odbcConnect("****",
uid="****", pwd="****")
d = sqlQuery(connection, "select userid, x, y, z
from
largetable where timestamp > sysdate -7") ...allowing me to pull data from the database table
into
the R object "d" and then use the R merge function. The problem however is that if
"d" is
too large it may fail due to memory limitations or be inefficient. I would like to push the merge portion
to the
database and it would be very convenient if it were
possible
to request that the query look to the R object for the ID's to which it should restrict the output. Is there a way to do this? Something like the following fictional code: d = sqlQuery(connection, "select t.userid, x, y,
z
from largetable t where r$userid=t.userid") Would sqldf (http://code.google.com/p/sqldf/) help me
out
here? If so, how? This would be convenient and help
me
avoid needing to create a temporary table to store the
R
data, join via sql, then return the data back to R. I am using R version 2.7.2 (2008-08-25) /
i386-pc-mingw32 .
Thanks for your comments, ideas, recommendations. -Avram
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.