Skip to content

RSQLite: ATTACH statement not executed when the db connection is holding a resultSet

12 messages · Hervé Pagès, Gabor Grothendieck, Seth Falcon

#
Hi Seth,

Here is how to reproduce the problem.

First create 2 databases with a single empty table in each:

  library(RSQLite)

  db1 <- dbConnect(SQLite(), "db1.sqlite")
  dbGetQuery(db1, "CREATE TABLE t1 (a integer, aa text)")

  db2 <- dbConnect(SQLite(), "db2.sqlite")
  res <- dbSendQuery(db2, "CREATE TABLE t2 (b integer, bb text)")

Note that dbSendQuery() is used instead of dbGetQuery() to create the
table in db2 so now db2 is holding a resultSet:

  > summary(db2)
  <SQLiteConnection:(28981,1)>
    User: NA
    Host: localhost
    Dbname: db2.sqlite
    Connection type: direct
    Loadable extensions: off
      1  <SQLiteResult:(28981,1,1)>

Second, try to attach db1 to db2:

  > dbGetQuery(db2, "ATTACH 'db1.sqlite' AS db1")
  NULL

No errors so it looks like it got attached but:

  1. This doesn't work:

       dbGetQuery(db2, "SELECT * FROM db1.t1")

  2. I can run this as many times I want (normally you can't attach a
     db that's already attached):

       dbGetQuery(db2, "ATTACH 'db1.sqlite' AS db1")
       dbGetQuery(db2, "ATTACH 'db1.sqlite' AS db1")
       ... etc ...

  3. I can't detach it either:

       dbGetQuery(db2, "DETACH db1")

so everything tends to indicate that db1 was not attached in the
first place.

Now if I clear 'res', things work as expected:

  dbClearResult(res)
  summary(db2)
  dbGetQuery(db2, "ATTACH 'db1.sqlite' AS db1")
  dbGetQuery(db2, "SELECT * FROM db1.t1")
  dbGetQuery(db2, "DETACH db1")

Any idea what's going on?

Thanks!
H.
R version 2.7.0 Under development (unstable) (2007-12-20 r43747)
x86_64-unknown-linux-gnu

locale:
LC_CTYPE=en_US;LC_NUMERIC=C;LC_TIME=en_US;LC_COLLATE=en_US;LC_MONETARY=en_US;LC_MESSAGES=en_US;LC_PAPER=en_US;LC_NAME=C;LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=en_US;LC_IDENTIFICATION=C

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base

other attached packages:
[1] RSQLite_0.6-4 DBI_0.2-4
#
I don't think you have established db1.sqlite as an external sqlite
database yet -- its still only known to the db1 connection.  Try
disconnecting from db1 before attaching it in the db2 connection to
flush it out.

You can likely re-connect to db1 again right after disconnecting db1
even before attaching it in db2 if you wish -- that would be optional.
On Jan 17, 2008 7:56 PM, Herve Pages <hpages at fhcrc.org> wrote:
#
Hi Herve,

A quick response now and I will try to take a closer look later on...

Herve Pages <hpages at fhcrc.org> writes:
[snip]
I'm pretty sure that SQLite only allows one result set per connection.
Hence it does not surprise me that the above does not work -- but an
error message would be nice :-)

Do you have a use-case where you really want to be able to have open
resultSets and keep doing things (which I'm not sure is possible)?  I
think the fix here is going to be to raise an error when this is
attempted.

For example, if you try to create two result sets that have pending
rows, you do get an error:

    > res <- dbSendQuery(db2, "select * from t2")
    > dbListResults(db2)
    [[1]]
    <SQLiteResult:(7183,1,4)> 
    
    > res <- dbSendQuery(db2, "select * from t2")
    Error in sqliteExecStatement(conn, statement, ...) : 
      RS-DBI driver: (connection with pending rows, close resultSet before continuing)

For result sets that don't return rows, there isn't much good that you
can do with the result set.  Perhaps these should be invalidated on
return so that you can open another resultSet?

Other ideas?

+ seth
#
Hi Seth,
Seth Falcon wrote:
No use-case. But we have no way no prevent our users to make improper
use of dbSendQuery() (we'll try to advertise the use of much safer
dbGetQuery() instead but that's all we can do). So, yes, raising an
error that tells them that the ATTACH command failed (or could not be
sent) or, more generally, that any SQL statement failed, seems like
the right thing to have.
Yes I've seen this. And if the first result set does not correspond to a
SELECT statement, then it is silently replaced by the next result set:

  library(RSQLite)
  db <- dbConnect(SQLite(), "test.sqlite")
  dbSendQuery(db, "CREATE TABLE test (a integer, aa text)")
  dbSendQuery(db, "SELECT * FROM test")

First result has disappeared:

  > dbListResults(db)
  [[1]]
  <SQLiteResult:(31978,0,2)>

Personally I find this "automatic result set clearing" feature a little
bit confusing and it raises some concerns about compatibility with other
RDBMSs (see below).

Back to the ATTACH problem: if I try to ATTACH with dbSendQuery() instead
of dbGetQuery() then the "automatic result set clearing" feature enter in
action:

  library(RSQLite)
  db1 <- dbConnect(SQLite(), "db1.sqlite")
  dbGetQuery(db1, "CREATE TABLE t1 (a integer, aa text)")
  db2 <- dbConnect(SQLite(), "db2.sqlite")
  dbSendQuery(db2, "CREATE TABLE t2 (b integer, bb text)")
  dbSendQuery(db2, "ATTACH 'db1.sqlite' AS db1")
  dbListResults(db2) # only 1 result set

but now it's hard to tell whether or not db1 is attached to db2:

  1. This seems to indicate that it is NOT:

       > dbGetQuery(db2, "SELECT * FROM db1.t1")
       Error in sqliteExecStatement(new.con, statement, bind.data) :
          RS-DBI driver: (error in statement: no such table: db1.t1)

  2. This seems to indicate that it is:

       > dbSendQuery(db2, "ATTACH 'db1.sqlite' AS db1")
       Error in sqliteExecStatement(conn, statement, ...) :
         RS-DBI driver: (RS_SQLite_exec: could not execute1: database db1 is already in use)

  3. And this too seems to indicate that it is was attached:

       > dbGetQuery(db2, "DETACH db1")
       NULL
       > dbGetQuery(db2, "DETACH db1")
       Error in sqliteExecStatement(con, statement, bind.data) :
         RS-DBI driver: (RS_SQLite_exec: could not execute1: no such database: db1)
The problem is consistency with other RDBMSs: if the users want to be able
to write portable code, they need to have some guarantee that things behave
consistently between RSQLite, RMySQL, ROracle, etc...
In this regard I would say it's better to keep things as predictable as
possible and maybe to avoid features like the "automatic result set clearing".
Every successful call to dbSendQuery creates a new entry in the result set
table and this entry remains here until explicitly removed by the user,
whatever the nature of the SQL statement was. If SQLite allows only one
result set per connection, then an error should be raised when dbSendQuery
is called and there is already an entry in the result set table.

I can see that the "automatic result set clearing" feature adds some
convenience by discarding result sets that DO NOT correspond to a SELECT
statement (so that they never end up in the result set table) but then
everybody in the DBI family should do the same thing.

As for treating result sets that DO correspond to a SELECT statement
but contain 0 rows the same way as result sets that DO NOT correspond
to a SELECT statement, I don't really like the idea. Because the 2 types
of result sets are fundamentally different: having rows or not in the
former depends on the data found in the DB while getting a result set
of the latter type does not (it only depends on a programmatic decision
i.e. on the nature of the SQL statement that was sent).

Thanks!
H.
#
Hi Gabor,
Gabor Grothendieck wrote:
No, disconnecting from db1 doesn't change anything. When someone in my
group found this problem, he was working with real dbs i.e. dbs that have
been on disk for a long time and opened/closed many times. Using a freshly
created db1.sqlite in my example is only for the purpose of providing
reproducible code.

Anyway if ATTACH fails for the reason you suggest or for whatever reason,
then an error should be raised.

Thanks,
H.
#
Hi Herve,

Herve Pages <hpages at fhcrc.org> writes:
At present, dbGetQuery is IMO less safe, or at least more confusing
(see more below).
Almost.  If my reading of the code is correct, a ResultSet contains a
flag indicating whether or not is is complete.  For a SELECT query,
complete set to FALSE means there are more rows to fetch.  For
non-SELECT queries it is assumed that complete is TRUE right away
(there is a comment in the code that this might cause a problem if
there are asynchronous queries, but I don't think this is an issue
with SQLite).

When RSQLite attempts to execute SQL, it checks for an existing
ResultSet and _closes is automatically_ if its complete flag is TRUE.
If the existing ResultSet is not complete, an error is thrown.
I agree that this can cause confusion and that using this "feature"
defeats the notion of DBI (being able to swap RDBMS's without changing
much code).

But I didn't invent this feature and don't have any sense of whether
or not people are relying on it :-\
Yes.  But what is confusing is what dbGetQuery does.  The semantics of
dbGetQuery are not what they seem.  When a ResultSet already exists,
dbGetQuery creats a new _connection_ and executes there.  SQL
statements like ATTACH that are called essentially for their
side-effect on the connection are lost.  In your example, the ATTACH
works just fine, but occurs on a new (transient) connection.  If
instead of dbGetQuery, you had used dbSendQuery, things would have
worked.

Here's my example:

  library("RSQLite")
  db1 <- dbConnect(SQLite(), "db1.sqlite")
  data(USArrests)
  dbWriteTable(db1, "ua", USArrests, row.names=FALSE)
  
  db2 <- dbConnect(SQLite(), "db2.sqlite")
  res <- dbSendQuery(db2, "CREATE TABLE t1 (b integer, bb text)")
  
  ## this does the ATTACH on a transient connection :-(
  dbGetQuery(db2, "ATTACH 'db1.sqlite' AS db1")
  ## Hence, this fails:
  dbGetQuery(db2, "select * from db1.ua limit 3")
  
  ## redo ATTACH... this works because ResultSet res is complete
  ## and so it can be auto-closed.
  res2 <- dbSendQuery(db2, "ATTACH 'db1.sqlite' AS db1")
  
  ## Note that dbGetQuery _still_ won't work because now
  ## res2 is open (although complete) and the new transient connection
  ## won't have the ATTACH'ed DB.
  dbGetQuery(db2, "select * from db1.ua limit 3")
  
  ## But using dbSendQuery, we get the auto-close behavior
  res3 <- dbSendQuery(db2, "select * from db1.ua limit 3")
  fetch(res3)
  
  ## And if we close the ResultSet, dbGetQuery behaves as "expected".
  dbClearResult(res3)
  dbGetQuery(db2, "select * from db1.ua limit 3")
I agree that consistency is valuable and is in large part the point of
DBI.  However, things will never be symmetric since SQLite doesn't
support the same features as other RDBMS's.  Code that happily creates
multiple ResultSets might work find in RMySQL (does it?) would raise
errors with RSQLite.

I guess I see your argument, but the auto-close feature doesn't
prevent you from making explicit dbClearResult calls and having
portable code.
I think there are two ways to go here:

1. Fix dbGetQuery to have the same auto-close semantics.

2. Remove the auto-close feature entirely.  Should dbGetQuery change
   too such that it fails when a ResultSet is open instead of opening
   a new connection?  That is easier to explain, but not at all
   backwards compatible.

+ seth
7 days later
#
Hi all,

Seth Falcon <seth at userprimary.net> writes:
I've just uploaded a new RSQLite 0.6-7 with the following changes:

* dbGetQuery now has the same auto-close semantics as dbSendQuery

* dbGetQuery issues a warning message when it opens a temporary
  connection because there is an incomplete result set.

+ seth
2 days later
#
Hi Seth,
Seth Falcon wrote:
OK I get the warning now:

  library(RSQLite)
  db1 <- dbConnect(SQLite(), "db1.sqlite")
  dbGetQuery(db1, "CREATE TABLE t1 (a integer, aa text)")
  db2 <- dbConnect(SQLite(), "db2.sqlite")
  dbGetQuery(db2, "CREATE TABLE t2 (b integer, bb text)")
  dbGetQuery(db2, "ATTACH 'db1.sqlite' AS db1")
  dbSendQuery(db2, "SELECT * FROM db1.t1")

  > dbGetQuery(db2, "SELECT * FROM db1.t1")
  Error in sqliteExecStatement(new.con, statement, bind.data) :
    RS-DBI driver: (error in statement: no such table: db1.t1)
  In addition: Warning message:
  In sqliteQuickSQL(conn, statement, ...) :
    There is an open, incomplete result set; executing query on a temporary connection

  > dbGetQuery(db2, "DETACH db1")
  Error in sqliteExecStatement(new.con, statement, bind.data) :
    RS-DBI driver: (RS_SQLite_exec: could not execute1: no such database: db1)
  In addition: Warning message:
  In sqliteQuickSQL(conn, statement, ...) :
    There is an open, incomplete result set; executing query on a temporary connection

but dbGetQuery() doesn't seem to be reporting what one would expect.
Even if there is a warning now, I still find the "temporary connection"
feature confusing...

Thanks!
H.
R version 2.7.0 Under development (unstable) (2007-12-20 r43747)
x86_64-unknown-linux-gnu

locale:
LC_CTYPE=en_US;LC_NUMERIC=C;LC_TIME=en_US;LC_COLLATE=en_US;LC_MONETARY=en_US;LC_MESSAGES=en_US;LC_PAPER=en_US;LC_NAME=C;LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=en_US;LC_IDENTIFICATION=C

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base

other attached packages:
[1] RSQLite_0.6-7 DBI_0.2-4
#
Hi Herve,

Herve Pages <hpages at fhcrc.org> writes:
I'm not sure what you mean.  What do you expect dbGetQuery to report?

The above looks "right" to me.  You got a warning that told you that
your ATTACH occured on a temp connection.  Since ATTACH is called for
its side-effect on the _connection_ this means your ATTACH was
useless.
Yes, the temporary connection "feature" _is_ confusing.  I would like
to remove this feature and have dbGetQuery behave like dbSendQuery:
error if there is an open and incomplete result set, close and process
a complete, but open result set, and encourage users to explicitly
close all result sets that they open.

I haven't done this yet, because making non-backwards compatible
changes should be done with some thought ... and some time for
interested parties to weigh in... anyone?

+ seth
#
Hi Seth,
Seth Falcon wrote:
Are you sure? My understanding is that the ATTACH didn't occur on a temp connection
but really occurred on the db2 connection itself (I mean the real one). It's my
dbGetQuery(db2, "SELECT * FROM db1.t1") that was sent later thru a temporary connection
and thus was not aware that db1 was attached to db2. Or am I missing something?
I agree. But the ATTACH could have been sent a long time before (many queries before
in the query history), and not necessarily by me, so I don't really know the who/how/when
of it. And then suddenly, db1 doesn't seem to be attached to db2 anymore, even if everything
so far seemed to indicate that it was attached (let's say I've sent
dbGetQuery(db2, "SELECT * FROM db1.t1") many times before with no problems because there
was no pending results in db2).

So we have a situation where depending on whether I use dbGetQuery() or dbSendQuery() and
whether there is a pending result set or not, db1 will sometimes appear as attached to db2,
and sometimes not. That's what I mean by dbGetQuery() not reporting what one would expect.

Maybe the motivation behind the temporary connection "feature" feature was to provide
some convenience to the user so s/he can do:

  dbGetQuery(db2, myquery)

even if there is a pending result set. But then it should not affect the semantic of
dbGetQuery() i.e. one might expect the same result as with:

  ## clear all pending results first
  while (length(dbListResults(db2)) >= 1) dbClearResult(dbListResults(db2)[[1]])
  ## send the query
  dbGetQuery(db2, myquery)

which is not the case.

I understand that this is due to the temporary connection "feature" and that it might
look right to anybody who knows all the internal gears, but still...

BTW, according to the documentation it seems that the 'cache_size' and 'synchronous'
values that the user can specify when opening an SQLite connection with dbConnect()
are not used for the temporary connection. Any reason behind this?
Sounds good.

Thanks,
H.
#
Herve Pages <hpages at fhcrc.org> writes:
Sorry, I was not being careful.  You are right, the ATTACH should have
occured on the db2 connection, but because you have an incomplete
result set hanging around, the subsequent calls to dbGetQuery are
executed on a connection without the attach -- so you are not missing
something and I was.  {though I'm still missing what you expected, the
output looks "right"}.
And that's why as a start, I added the warning message which
presumably would lead you to the problem rather quickly.
why not:

    lapply(dbListResults(db2), dbClearResult)
Only that I only learned of the temp connection behavior from your
detailed bug report ;-)

I'm becomming quite convinced that the temp conneciton "feature" is
far too expensive in terms of maintenance (the cache_size issue) and
user confusion (the ATTACH type of issues).  I would not be surprised if
the code originated before SQLite even supported ATTACH.

I will be testing a patch that removes the temp connection and gives a
clear error message about how to close open connections.

Thanks, Herve, for your detailed explanations and report.

+ seth
#
Hi Seth,

Just FYI, I tried with RMySQL and here too dbGetQuery() seems to use the
temp connection trick. So, not surprisingly, it leads to the same kind of
problems:

  > library(RMySQL)
  > db <- dbConnect("MySQL", ...)
  > dbGetQuery(db, "USE test") # selecting the 'test' database
  > dbSendQuery(db, "SELECT * FROM go_term")
  <MySQLResult:(24051,0,3)>
  > dbGetQuery(db, "SELECT * FROM go_term")
  Error in mysqlExecStatement(conn, statement, ...) :
    RS-DBI driver: (could not run statement: No Database Selected)

A workaround for this particular error would have been to specify the database
at connection time with the 'db' arg:

  > test_db <- dbConnect("MySQL", ..., db="test")

Then any temp connection created by dbGetQuery() will use the same db by default.

But again, I can see many different ways to get hit by the temp connection "feature",
especially when one needs to work with several databases and wants to switch between
them with the USE command.
Seth Falcon wrote:
[...]
Good. Let me know if you need more testing.
Thanks a lot!

H.