SQL generics
Hi Hadley, The sqlQuoteString() and sqlQuoteIdentifer() generics are useful. More generally speaking I like the idea of providing utilities that make it easier to generate SQL, and even better if those utilities help generate SQL that will automagically adapt to the current back-end. Because of the sql prefix my understanding is that the generics below (sqlCreateTable(), etc...) only generate strings containing valid SQL but that they actually don't talk to the db. I understand that you need the 'conn' argument for dispatch but it might be a little bit confusing. Are you also considering to provide dbCreateTable(), dbDropTable(), dbInsertInto(), dbUpdate(), and dbDeleteFrom() generics? Do we really need the 2 sets? If not, which one is most useful? Alternatively the 2nd set could have a 'return.SQL.only' arg (FALSE by default) so the user/developer has a way to know what SQL is sent to the db (can be useful for learning/debugging, or for "retouching" the query before sending). If it was possible to use only high-level back-end agnostic commands like this in our R code when talking to a db, then it would be easy to support multiple back ends. Sounds like a noble goal in theory but it might be hard to achieve. As you noticed the grammar of the SELECT statement is complicated and complex statements can require a lot of back-end specific tweaking in order to perform efficiently. Thanks, H.
On 10/22/2013 01:56 PM, Hadley Wickham wrote:
To make more concrete, I've done a quick implementation at https://github.com/rstats-db/DBI/blob/master/R/sql.R This is just a rough first pass and feedback would be most appreciated! Hadley On Tue, Oct 22, 2013 at 10:59 AM, Hadley Wickham <h.wickham at gmail.com> wrote:
Hi all, I wonder if it might be useful to start developing a new set of generics for SQL generation. A reasonable target would be the subset of SQL-92 grammar defined for minimal ODBC compliance: http://msdn.microsoft.com/en-us/library/ms711725(v=vs.85).aspx That would lead to the following functions which dispatch on the connection and and return a SQL vector as output (possibly of length > 1). Default methods would be SQL-92 compatible, and provide a reference implementation for package authors to see how to best deal with escaping. * `sqlCreateTable(conn, table, fields, ...)` * `sqlDeleteFrom(conn, table, where, ...)` * `sqlDropTable(conn, table, ...)` * `sqlInsertInto(conn, table, values, ...)` * `sqlUpdate(conn, table, values, ...)` Note that I have deliberately not included `sqlSelect()` in this list, because the grammar of the select statement is so much more complicated in general, and varies considerably more between databases. I'd suggest not including it in a first pass. With `dbSendQuery()`, these could also be used to provide default implementations of `dbWriteTable()`, and `dbDropTable()`. Implementators would almost always override `dbWriteTable()` to provide a more efficient implementation for bulk data loading. What do you think? Hadley -- Chief Scientist, RStudio http://had.co.nz/
Herv? Pag?s Program in Computational Biology Division of Public Health Sciences Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N, M1-B514 P.O. Box 19024 Seattle, WA 98109-1024 E-mail: hpages at fhcrc.org Phone: (206) 667-5791 Fax: (206) 667-1319