Because of the sql prefix my understanding is that the generics below
(sqlCreateTable(), etc...) only generate strings containing valid SQL
but that they actually don't talk to the db. I understand that you
need the 'conn' argument for dispatch but it might be a little bit
confusing.
Right. Dispatching on conn would be a bit confusing for users, but I
think these would mostly be used by package authors.
Are you also considering to provide dbCreateTable(), dbDropTable(),
dbInsertInto(), dbUpdate(), and dbDeleteFrom() generics?
No, I don't think so - the layer of commands built on top of the sql*
functions would be more like dbWriteTable - i.e. they'd map to
operations that make more sense from the R side of things, rather than
being a straightforward translation to things that make sense to
databases.
Do we really need the 2 sets? If not, which one is most useful?
Alternatively the 2nd set could have a 'return.SQL.only' arg (FALSE
by default) so the user/developer has a way to know what SQL
is sent to the db (can be useful for learning/debugging, or for
"retouching" the query before sending).
I'm not a big fan of functions that return different types based on
their input arguments because they're harder to program against.
However, you could imagine these functions returning some sort of
reference class from which you could either extract the sql, or you
could run the query once. But that seems a bit too complicated for
DBI.
If it was possible to use only high-level back-end agnostic commands
like this in our R code when talking to a db, then it would be easy
to support multiple back ends. Sounds like a noble goal in theory but
it might be hard to achieve. As you noticed the grammar of the SELECT
statement is complicated and complex statements can require a lot of
back-end specific tweaking in order to perform efficiently.
Yes, I've spent quite a lot of time on this for the dplyr package,
which provides a syntax that allows you to work with data frames, data
tables and databases (sqlite, mysql, postgresql and google big query)
without having to worry too much about the how the data is stored. But
the only way to do this is to carefully constrain the problem domain,
and I think there's too much variation in SQL grammars for SQL to
serve as the abstraction layer (not without a huge amount of work
anyway).
Some of the methods might only actually be useful transiently while
you're developing a new DBI package (i.e. once you've implemented
sqlCreateTable and sqlInsertInto you get an inefficient dbWriteTable
for free), or for testing. But I don't think that's necessarily a
problem.
Hadley