dbSendPreparedQuery in RSQLite - R-SIG-DB

Tue, Apr 15, 2008 5:31 PM #

How do we send prepared queries in RSQLite. We are interested in  
something along the lines of

ps = dbSendPreparedQuery(conn, "SELECT * FROM table1 WHERE chr = ?",  
data.frame("integer"))
for( i in 1:2)
   rs = dbGetPreparedQuery(ps, i)

In C it would look something along the lines of
http://www.mail-archive.com/sqlite-users at sqlite.org/msg31854.html

Thanks
Jim

Sean Davis

Tue, Apr 15, 2008 7:21 PM #

On Tue, Apr 15, 2008 at 8:31 PM, James Bullard <bullard at berkeley.edu> wrote:

It looks like this is not supported for select statements, though I
could easily be wrong.  In this particular case, you could rewrite
your select as:

"SELECT * from table1 where chr in (1,2)"

Sean

James Bullard

Tue, Apr 15, 2008 8:45 PM #

Yes, in this example that would certainly work. However, I have a  
large number of queries in a tight loop and it would be nice to avoid  
the overhead of the query parsing and compiling on each call through  
the loop. I believe it is not supported on select statements as well  
and wonder if this would be a hard addition, also, I wonder if this is  
the correct syntax below. From other settings it seems a little off  
(see the link in the original email).

thanks again, jim

On Apr 15, 2008, at 7:21 PM, Sean Davis wrote:

Seth Falcon

Wed, Apr 16, 2008 8:24 AM #

* On 2008-04-15 at 20:45 -0700 James Bullard wrote:

You are correct, currently prepared queries are not supported for
SELECT (I have no memory of why this is the case).  I think that
supporting prepared queries for SELECT should be possible, though not
trivial.

Are you sure you can't avoid your loop by sending a large query and
batching through it using fetch on the result set returned?

By below I think you mean:

Almost.  As above, this is only supported with non-SELECT, so you
might do (untested):

    df = data.frame(x=1:10, y=letters[1:10])
    sql = "INSERT INTO foo (a, b) VALUES (?, ?)"
    rs = dbSendPreparedQuery(db, sql, df)

So you pass a data.frame and the loop is internal.
You can also name the parameters so you could have:

    sql = "INSERT INTO foo (a, b) VALUES (:y, :x)"
    rs = dbSendPreparedQuery(db, sql, df)

And y, x are matched based on names(df).

+ seth

Seth Falcon | http://userprimary.net/user/

Sean Davis

Wed, Apr 16, 2008 8:47 AM #

On Tue, Apr 15, 2008 at 11:45 PM, James Bullard <bullard at berkeley.edu> wrote:

A couple of typical tricks for avoidance of loops is to either:

1) Use an "in" query
2) Use a temporary table (CREATE TEMPORARY TABLE .... or
dbWriteTable()) and then do a join

If you have hundreds or thousands of queries, it is quite possible
that #2 could be much faster.  Benchmarking would be useful here, but
it is definitely worth a try if you are too slow with looping.

Sean

James Bullard

Wed, Apr 16, 2008 10:35 AM #

Okay, well I thought that what I was doing was not easily done with  
one query but since I am wrong more often than not I'll ask and see if  
anyone can think of a way to combine it into one query.

I have a set of regions:

regions <- cbind(chr = c(1,2,3), start = c(1,230,4950), end = c(293,  
320, 19200))

In the db I have the following table:

CREATE TABLE count_table (chr, location, count);

chr location count
1	10		1
1	290		10
2	245		3
3	10		2

The query that I am using is:

apply(regions, 1, function(region) {
	paste("SELECT * FROM count_table WHERE chr =", region[1], "AND  
location >=", region[2],  "AND location <=", region[3], ";")
})

Which gives me back the following (which is exactly what I want):
[[1]]
   chr location count__1
1   1       10        1
2   1      290       10

[[2]]
   chr location count__1
1   2      245        3

[[3]]
NULL data frame with 0 rows


So I don't see any obvious way to pass in multiple ranges, but maybe I  
missing something with using the IN operator, thanks in advance, sorry  
for the long email.


Jim

On Apr 16, 2008, at 8:47 AM, Sean Davis wrote:

Sean Davis

Wed, Apr 16, 2008 12:21 PM #

On Wed, Apr 16, 2008 at 1:35 PM, James Bullard <bullard at berkeley.edu> wrote:

chr location count__1
1   1       10        1
2   1      290       10
3   2      245        3
4   3       10        2

chr start end__1
1   1     1    293
2   2   230    320
3   3  4950  19200

chr location count__1
1   1       10        1
2   1      290       10
3   2      245        3

So, the principle of a temp table works.  Let's try some benchmarking.
 Create a random count table of length 100k rows, write it to a
database and create some indexes to speed querying.

chr location count
1  14 91737974     5
2   5 61059218     2
3   2 23944824     4
4   3 82907389     9
5  22 94658940     1

[1] TRUE

NULL

NULL

So, to try a benchmark, I wrote a little function that creates random
regions (n of them, see function below).  In the tmp table approach,
the data.frame is written to a table in the database and a join is
used.  In the second, the apply() approach is used.  The function
below returns the system.time for each approach.  In my quick
benchmarking, the temp table approach is about 5 times faster
(including writing the temp table) with a count_table of length 100k
rows.  I would expect that it might be a very much larger difference
as the size of the count_table increases, as the indexes on
count_table must be scanned for each of the apply queries and only
once for the temp table/join approach.

benchRegions <- function(n) {
   # n is the number of regions.
   # returns a list with 2 sets of system times, one for
   #  the temp table approach, and one for the apply approach
   res <- list()
   starts=sample(1:100000000,n,replace=TRUE)
   ends=starts+sample(1:1000,n,replace=TRUE)
   regions <- data.frame(chr=sample(1:22,n,replace=TRUE),
                         start=starts,
                         end=ends)
   res[['tmptab']] <-
system.time({dbWriteTable(con,'regions',regions,row.names=FALSE,overwrite=TRUE);
dbGetQuery(con,'select count_table.* from count_table join regions on
regions.chr=count_table.chr and count_table.location between
regions.start and regions.end__1')})
   res[['apply']] <- system.time(apply(regions, 1, function(region)
{dbGetQuery(con,paste("SELECT * FROM count_table WHERE chr =",
region[1], "AND location >=", region[2],  "AND location <=",
region[3], ";"))
}))
   return(res)
}

Hope that helps.

Sean

James Bullard

Thu, Apr 17, 2008 11:04 AM #

Thank you Sean, this worked great. Now the cost of going to the  
database pales in comparison to the split that I need to do in the end  
to properly divide up the returned data.frame into a list where each  
member is a region. It is unfortunate because the data.frame returned  
to R is sorted in region order.

res <-

So then in order to return the object in the appropriate list fashion  
I do:

split(res, res$region)

This splitting dominates the cost of the entire function.  
Unfortunately, the splitting doesn't use the fact that it is sorted  
according to the factor. This I think I will handle by going to the  
database and getting the number of rows in a region and then using  
that vector to partition up the data (unless of course anyone has any  
smarter ideas)

Thanks again, I really appreciate the detailed example it helped  
tremendously in sorting out the indexing.

jim

On Apr 16, 2008, at 12:21 PM, Sean Davis wrote:

On Wed, Apr 16, 2008 at 1:35 PM, James Bullard  
<bullard at berkeley.edu> wrote:

Okay, well I thought that what I was doing was not easily done with  
one
query but since I am wrong more often than not I'll ask and see if  
anyone
can think of a way to combine it into one query.

I have a set of regions:

regions <- cbind(chr = c(1,2,3), start = c(1,230,4950), end =  
c(293, 320,
19200))

In the db I have the following table:

CREATE TABLE count_table (chr, location, count);

chr location count
1       10              1
1       290             10
2       245             3
3       10              2

The query that I am using is:

apply(regions, 1, function(region) {
       paste("SELECT * FROM count_table WHERE chr =", region[1], "AND
location >=", region[2],  "AND location <=", region[3], ";")
})

Which gives me back the following (which is exactly what I want):
[[1]]
 chr location count__1
1   1       10        1
2   1      290       10

[[2]]
 chr location count__1
1   2      245        3

[[3]]
NULL data frame with 0 rows


So I don't see any obvious way to pass in multiple ranges, but  
maybe I
missing something with using the IN operator, thanks in advance,  
sorry for
the long email.

counts <-  
data 
.frame(chr=c(1,1,2,3),location=c(10,290,245,10),count=c(1,10,3,2))
dbWriteTable(con,'count_table',counts,row.names=FALSE,overwrite=TRUE)
dbGetQuery(con,'select * from count_table')

 chr location count__1
1   1       10        1
2   1      290       10
3   2      245        3
4   3       10        2

regions <- data.frame(chr = c(1,2,3), start = c(1,230,4950), end =  
c(293, 320, 19200))
dbWriteTable(con,'regions',regions,row.names=FALSE,overwrite=TRUE)
dbGetQuery(con,'select * from regions')

 chr start end__1
1   1     1    293
2   2   230    320
3   3  4950  19200

dbGetQuery(con,'select count_table.* from count_table join regions  
on regions.chr=count_table.chr and count_table.location between  
regions.start and regions.end__1')

 chr location count__1
1   1       10        1
2   1      290       10
3   2      245        3

So, the principle of a temp table works.  Let's try some benchmarking.
Create a random count table of length 100k rows, write it to a
database and create some indexes to speed querying.

counts <-  
data 
.frame 
(chr 
= 
sample 
(1 
: 
22,100000 
,replace 
= 
TRUE 
),location 
= 
sample 
(1 
: 
100000000,100000 
,replace=TRUE),count=sample(1:10,100000,replace=TRUE))
counts[1:5,]

 chr location count
1  14 91737974     5
2   5 61059218     2
3   2 23944824     4
4   3 82907389     9
5  22 94658940     1

dbWriteTable(con,'count_table',counts,row.names=FALSE,overwrite=TRUE)

[1] TRUE

dbGetQuery(con,'create index ct_chr on count_table(chr)')

NULL

dbGetQuery(con,'create index ct_location on count_table(location)')

NULL

So, to try a benchmark, I wrote a little function that creates random
regions (n of them, see function below).  In the tmp table approach,
the data.frame is written to a table in the database and a join is
used.  In the second, the apply() approach is used.  The function
below returns the system.time for each approach.  In my quick
benchmarking, the temp table approach is about 5 times faster
(including writing the temp table) with a count_table of length 100k
rows.  I would expect that it might be a very much larger difference
as the size of the count_table increases, as the indexes on
count_table must be scanned for each of the apply queries and only
once for the temp table/join approach.

benchRegions <- function(n) {
  # n is the number of regions.
  # returns a list with 2 sets of system times, one for
  #  the temp table approach, and one for the apply approach
  res <- list()
  starts=sample(1:100000000,n,replace=TRUE)
  ends=starts+sample(1:1000,n,replace=TRUE)
  regions <- data.frame(chr=sample(1:22,n,replace=TRUE),
                        start=starts,
                        end=ends)
  res[['tmptab']] <-
system 
.time 
({dbWriteTable(con,'regions',regions,row.names=FALSE,overwrite=TRUE);
dbGetQuery(con,'select count_table.* from count_table join regions on
regions.chr=count_table.chr and count_table.location between
regions.start and regions.end__1')})
  res[['apply']] <- system.time(apply(regions, 1, function(region)
{dbGetQuery(con,paste("SELECT * FROM count_table WHERE chr =",
region[1], "AND location >=", region[2],  "AND location <=",
region[3], ";"))
}))
  return(res)
}

Hope that helps.

Sean

On Apr 16, 2008, at 8:47 AM, Sean Davis wrote:

On Tue, Apr 15, 2008 at 11:45 PM, James Bullard <bullard at berkeley.edu

wrote:

Yes, in this example that would certainly work. However, I have a  
large
number of queries in a tight loop and it would be nice to avoid the

overhead

of the query parsing and compiling on each call through the loop.

A couple of typical tricks for avoidance of loops is to either:

1) Use an "in" query
2) Use a temporary table (CREATE TEMPORARY TABLE .... or
dbWriteTable()) and then do a join

If you have hundreds or thousands of queries, it is quite possible
that #2 could be much faster.  Benchmarking would be useful here,  
but
it is definitely worth a try if you are too slow with looping.

Sean

thanks again, jim




On Apr 15, 2008, at 7:21 PM, Sean Davis wrote:

On Tue, Apr 15, 2008 at 8:31 PM, James Bullard <bullard at berkeley.edu

wrote:

How do we send prepared queries in RSQLite. We are interested in
something along the lines of

ps = dbSendPreparedQuery(conn, "SELECT * FROM table1 WHERE chr  
= ?",
data.frame("integer"))
for( i in 1:2)
rs = dbGetPreparedQuery(ps, i)

In C it would look something along the lines of
http://www.mail-archive.com/sqlite-users at sqlite.org/msg31854.html

It looks like this is not supported for select statements,  
though I
could easily be wrong.  In this particular case, you could rewrite
your select as:

"SELECT * from table1 where chr in (1,2)"

Sean

Sean Davis

Thu, Apr 17, 2008 4:19 PM #

On Thu, Apr 17, 2008 at 2:04 PM, James Bullard <bullard at berkeley.edu> wrote:

What is the final information that you want to have?  Do you actually
need the detail of the counts per region, or are you doing some
aggregate of the data in the end?  In other words, what do you do with
the data after the split?

Great.  That is what email lists are all about--we all learn from them!

 On Apr 16, 2008, at 12:21 PM, Sean Davis wrote:

On Wed, Apr 16, 2008 at 1:35 PM, James Bullard <bullard at berkeley.edu>

wrote:

Okay, well I thought that what I was doing was not easily done with one
query but since I am wrong more often than not I'll ask and see if

anyone

can think of a way to combine it into one query.

I have a set of regions:

regions <- cbind(chr = c(1,2,3), start = c(1,230,4950), end = c(293,

320,

19200))

In the db I have the following table:

CREATE TABLE count_table (chr, location, count);

chr location count
1       10              1
1       290             10
2       245             3
3       10              2

The query that I am using is:

apply(regions, 1, function(region) {
      paste("SELECT * FROM count_table WHERE chr =", region[1], "AND
location >=", region[2],  "AND location <=", region[3], ";")
})

Which gives me back the following (which is exactly what I want):
[[1]]
 chr location count__1
1   1       10        1
2   1      290       10

[[2]]
 chr location count__1
1   2      245        3

[[3]]
NULL data frame with 0 rows


So I don't see any obvious way to pass in multiple ranges, but maybe I
missing something with using the IN operator, thanks in advance, sorry

for

the long email.

counts <-

data.frame(chr=c(1,1,2,3),location=c(10,290,245,10),count=c(1,10,3,2))

dbWriteTable(con,'count_table',counts,row.names=FALSE,overwrite=TRUE)
dbGetQuery(con,'select * from count_table')

 chr location count__1
1   1       10        1
2   1      290       10
3   2      245        3
4   3       10        2

regions <- data.frame(chr = c(1,2,3), start = c(1,230,4950), end =

c(293, 320, 19200))

dbWriteTable(con,'regions',regions,row.names=FALSE,overwrite=TRUE)
dbGetQuery(con,'select * from regions')

 chr start end__1
1   1     1    293
2   2   230    320
3   3  4950  19200

dbGetQuery(con,'select count_table.* from count_table join regions on

regions.chr=count_table.chr and count_table.location between regions.start
and regions.end__1')

 chr location count__1
1   1       10        1
2   1      290       10
3   2      245        3

So, the principle of a temp table works.  Let's try some benchmarking.
Create a random count table of length 100k rows, write it to a
database and create some indexes to speed querying.

counts <-

data.frame(chr=sample(1:22,100000,replace=TRUE),location=sample(1:100000000,100000,replace=TRUE),count=sample(1:10,100000,replace=TRUE))

counts[1:5,]

 chr location count
1  14 91737974     5
2   5 61059218     2
3   2 23944824     4
4   3 82907389     9
5  22 94658940     1

dbWriteTable(con,'count_table',counts,row.names=FALSE,overwrite=TRUE)

[1] TRUE

dbGetQuery(con,'create index ct_chr on count_table(chr)')

NULL

dbGetQuery(con,'create index ct_location on count_table(location)')

NULL

So, to try a benchmark, I wrote a little function that creates random
regions (n of them, see function below).  In the tmp table approach,
the data.frame is written to a table in the database and a join is
used.  In the second, the apply() approach is used.  The function
below returns the system.time for each approach.  In my quick
benchmarking, the temp table approach is about 5 times faster
(including writing the temp table) with a count_table of length 100k
rows.  I would expect that it might be a very much larger difference
as the size of the count_table increases, as the indexes on
count_table must be scanned for each of the apply queries and only
once for the temp table/join approach.

benchRegions <- function(n) {
 # n is the number of regions.
 # returns a list with 2 sets of system times, one for
 #  the temp table approach, and one for the apply approach
 res <- list()
 starts=sample(1:100000000,n,replace=TRUE)
 ends=starts+sample(1:1000,n,replace=TRUE)
 regions <- data.frame(chr=sample(1:22,n,replace=TRUE),
                       start=starts,
                       end=ends)
 res[['tmptab']] <-

system.time({dbWriteTable(con,'regions',regions,row.names=FALSE,overwrite=TRUE);

dbGetQuery(con,'select count_table.* from count_table join regions on
regions.chr=count_table.chr and count_table.location between
regions.start and regions.end__1')})
 res[['apply']] <- system.time(apply(regions, 1, function(region)
{dbGetQuery(con,paste("SELECT * FROM count_table WHERE chr =",
region[1], "AND location >=", region[2],  "AND location <=",
region[3], ";"))
}))
 return(res)
}

Hope that helps.

Sean

On Apr 16, 2008, at 8:47 AM, Sean Davis wrote:

On Tue, Apr 15, 2008 at 11:45 PM, James Bullard <bullard at berkeley.edu>

wrote:

Yes, in this example that would certainly work. However, I have a

large

number of queries in a tight loop and it would be nice to avoid the

overhead

of the query parsing and compiling on each call through the loop.

A couple of typical tricks for avoidance of loops is to either:

1) Use an "in" query
2) Use a temporary table (CREATE TEMPORARY TABLE .... or
dbWriteTable()) and then do a join

If you have hundreds or thousands of queries, it is quite possible
that #2 could be much faster.  Benchmarking would be useful here, but
it is definitely worth a try if you are too slow with looping.

Sean

thanks again, jim




On Apr 15, 2008, at 7:21 PM, Sean Davis wrote:

On Tue, Apr 15, 2008 at 8:31 PM, James Bullard

<bullard at berkeley.edu>

wrote:

How do we send prepared queries in RSQLite. We are interested in
something along the lines of

ps = dbSendPreparedQuery(conn, "SELECT * FROM table1 WHERE chr =

?",

data.frame("integer"))
for( i in 1:2)
rs = dbGetPreparedQuery(ps, i)

In C it would look something along the lines of

http://www.mail-archive.com/sqlite-users at sqlite.org/msg31854.html

It looks like this is not supported for select statements, though

could easily be wrong.  In this particular case, you could rewrite
your select as:

"SELECT * from table1 where chr in (1,2)"

Sean

James Bullard

Fri, Apr 18, 2008 8:43 AM #

On Apr 17, 2008, at 4:19 PM, Sean Davis wrote:

Certainly the downstream analysis usually corresponds to some summary  
measure of a region. I am not too familiar with sqlite, but it does  
appear that I have the normal sum, max, min, avg functions which will  
be nice to use when applicable, however there is clearly a desire for  
access to the data within each region when these functions won't  
suffice (fitting models, calculating variances, and quantiles). That  
being said, your point is well taken, often the data.frame  
representation is more of what you want and now I provide both  
interfaces.

As an aside I implemented the method where I split using the bounds of  
each region determined via an SQL query (i.e. I got the number of rows  
corresponding to each region in the resulting data.frame). I then used  
an apply to partition these regions into the resulting list (where  
each element is a data.frame). This was to determine whether or not I  
was paying a cost for the the fact that split assumes that my data is  
unsorted according to the factor -- the results were not encouraging:  
the time spent in sorting during the split seems to pale in comparison  
to the construction of all the region data.frames. In any case, for  
now the running time is more or less acceptable.

thanks again, jim

Thanks again, I really appreciate the detailed example it helped
tremendously in sorting out the indexing.

Great.  That is what email lists are all about--we all learn from  
them!

On Apr 16, 2008, at 12:21 PM, Sean Davis wrote:

On Wed, Apr 16, 2008 at 1:35 PM, James Bullard  
<bullard at berkeley.edu>

wrote:

Okay, well I thought that what I was doing was not easily done  
with one
query but since I am wrong more often than not I'll ask and see if

anyone

can think of a way to combine it into one query.

I have a set of regions:

regions <- cbind(chr = c(1,2,3), start = c(1,230,4950), end =  
c(293,

320,

19200))

In the db I have the following table:

CREATE TABLE count_table (chr, location, count);

chr location count
1       10              1
1       290             10
2       245             3
3       10              2

The query that I am using is:

apply(regions, 1, function(region) {
     paste("SELECT * FROM count_table WHERE chr =", region[1], "AND
location >=", region[2],  "AND location <=", region[3], ";")
})

Which gives me back the following (which is exactly what I want):
[[1]]
chr location count__1
1   1       10        1
2   1      290       10

[[2]]
chr location count__1
1   2      245        3

[[3]]
NULL data frame with 0 rows


So I don't see any obvious way to pass in multiple ranges, but  
maybe I
missing something with using the IN operator, thanks in advance,  
sorry

for

the long email.

counts <-

data 
.frame(chr=c(1,1,2,3),location=c(10,290,245,10),count=c(1,10,3,2))

dbWriteTable 
(con,'count_table',counts,row.names=FALSE,overwrite=TRUE)
dbGetQuery(con,'select * from count_table')

chr location count__1
1   1       10        1
2   1      290       10
3   2      245        3
4   3       10        2

regions <- data.frame(chr = c(1,2,3), start = c(1,230,4950), end =

c(293, 320, 19200))

dbWriteTable(con,'regions',regions,row.names=FALSE,overwrite=TRUE)
dbGetQuery(con,'select * from regions')

chr start end__1
1   1     1    293
2   2   230    320
3   3  4950  19200

dbGetQuery(con,'select count_table.* from count_table join  
regions on

regions.chr=count_table.chr and count_table.location between  
regions.start
and regions.end__1')

chr location count__1
1   1       10        1
2   1      290       10
3   2      245        3

So, the principle of a temp table works.  Let's try some  
benchmarking.
Create a random count table of length 100k rows, write it to a
database and create some indexes to speed querying.

counts <-

data 
.frame 
(chr 
= 
sample 
(1 
: 
22,100000 
,replace 
= 
TRUE 
),location 
= 
sample 
(1 
: 
100000000,100000 
,replace=TRUE),count=sample(1:10,100000,replace=TRUE))

counts[1:5,]

chr location count
1  14 91737974     5
2   5 61059218     2
3   2 23944824     4
4   3 82907389     9
5  22 94658940     1

dbWriteTable 
(con,'count_table',counts,row.names=FALSE,overwrite=TRUE)

[1] TRUE

dbGetQuery(con,'create index ct_chr on count_table(chr)')

NULL

dbGetQuery(con,'create index ct_location on count_table(location)')

NULL

So, to try a benchmark, I wrote a little function that creates  
random
regions (n of them, see function below).  In the tmp table approach,
the data.frame is written to a table in the database and a join is
used.  In the second, the apply() approach is used.  The function
below returns the system.time for each approach.  In my quick
benchmarking, the temp table approach is about 5 times faster
(including writing the temp table) with a count_table of length 100k
rows.  I would expect that it might be a very much larger difference
as the size of the count_table increases, as the indexes on
count_table must be scanned for each of the apply queries and only
once for the temp table/join approach.

benchRegions <- function(n) {
# n is the number of regions.
# returns a list with 2 sets of system times, one for
#  the temp table approach, and one for the apply approach
res <- list()
starts=sample(1:100000000,n,replace=TRUE)
ends=starts+sample(1:1000,n,replace=TRUE)
regions <- data.frame(chr=sample(1:22,n,replace=TRUE),
                      start=starts,
                      end=ends)
res[['tmptab']] <-

system 
.time 
({dbWriteTable(con,'regions',regions,row.names=FALSE,overwrite=TRUE);

dbGetQuery(con,'select count_table.* from count_table join regions  
on
regions.chr=count_table.chr and count_table.location between
regions.start and regions.end__1')})
res[['apply']] <- system.time(apply(regions, 1, function(region)
{dbGetQuery(con,paste("SELECT * FROM count_table WHERE chr =",
region[1], "AND location >=", region[2],  "AND location <=",
region[3], ";"))
}))
return(res)
}

Hope that helps.

Sean

On Apr 16, 2008, at 8:47 AM, Sean Davis wrote:

On Tue, Apr 15, 2008 at 11:45 PM, James Bullard <bullard at berkeley.edu

wrote:

Yes, in this example that would certainly work. However, I have a

large

number of queries in a tight loop and it would be nice to avoid  
the

overhead

of the query parsing and compiling on each call through the loop.

A couple of typical tricks for avoidance of loops is to either:

1) Use an "in" query
2) Use a temporary table (CREATE TEMPORARY TABLE .... or
dbWriteTable()) and then do a join

If you have hundreds or thousands of queries, it is quite possible
that #2 could be much faster.  Benchmarking would be useful  
here, but
it is definitely worth a try if you are too slow with looping.

Sean

thanks again, jim




On Apr 15, 2008, at 7:21 PM, Sean Davis wrote:

On Tue, Apr 15, 2008 at 8:31 PM, James Bullard

<bullard at berkeley.edu>

wrote:

How do we send prepared queries in RSQLite. We are interested  
in
something along the lines of

ps = dbSendPreparedQuery(conn, "SELECT * FROM table1 WHERE  
chr =

?",

data.frame("integer"))
for( i in 1:2)
rs = dbGetPreparedQuery(ps, i)

In C it would look something along the lines of

http://www.mail-archive.com/sqlite-users at sqlite.org/msg31854.html

It looks like this is not supported for select statements,  
though

could easily be wrong.  In this particular case, you could  
rewrite
your select as:

"SELECT * from table1 where chr in (1,2)"

Sean