Parameterised queries
Hi Hadley,
On 11 Feb 2015, at 16:08, Hadley Wickham <h.wickham at gmail.com> wrote: On Wed, Feb 11, 2015 at 8:39 AM, Hannes M?hleisen <Hannes.Muehleisen at cwi.nl> wrote:
Hi Hadley and list,
On 11 Feb 2015, at 15:01, Hadley Wickham <h.wickham at gmail.com> wrote: As part of my work modernising R's database connnectors, I've been working on improving the support for parameterised queries. I'd love to get your feedback on the API I'm proposing. The goal is to encourage people to use parameterise queries instead of pasting together SQL strings because it is much safer (no worries about SQL injection attacks) and somewhat more performant (becuase the database only needs to parse the query once).
Makes a lot of sense, yes. MonetDB.R has had support for this from day one. Our syntax uses the list of parameters approach, e.g. dbSendUpdate(conn, "INSERT INTO sometable (a) VALUES (?)", ?foobar?) of course, the parameter can be a vector, in which case the query is executed multiple times.
I deliberately chose not to allow vector parameters because I think it's reasonable to say: "Each R function only ever generates a single query to the database", leaving the onus of looping on the user, and forcing them to think about how to ensure the vectors don't contain bad values. This the same principle behind disallowing multiple queries (separated by ";") in a single string.
However, there are optimisation opportunities that the db driver could exploit if multiple values are to be inserted at once. For example, a database connection in auto commit mode could switch of auto commit, and try to insert all values in a transaction to get all-or-nothing semantics. Another opportunity would be a bulk load if the vectors are large.
Generally, I would be in favour of the ?list of params? approach. Also, please note that the ?bind by name? is not supported by all databases. Sticking to position-only parameter binding using ? would be most compatible (also used in JDBC for example).
Yup, postgresql doesn't support names either. In that case, providing a named list would be an error. But where named binding is supported, I think it's better to use it because it eliminates a class of potential errors.
I think a script using DBI should not need to know which DBI implementation is running behind it. But if someone uses named parameters on a MySQL backend (possibly out of order), that script will not run with Postgres or others. Best, Hannes -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 4154 bytes Desc: not available URL: <https://stat.ethz.ch/pipermail/r-sig-db/attachments/20150211/9c4f187d/attachment.p7s>