On 02/13/2015 06:36 PM, Paul Gilbert wrote:
Alex Thanks for explaining this. I realized my question was a bit peripheral to the discussion, which I did not really follow, so thanks for the pointers. ...
If you bulk load generally you want it to be an atomic commit (all records or none), and you want it to hold of on doing triggers and indexes until after the data has made it to the db. Different Dbs have different performance levels.
I was guessing something like this was the reason
INSERT INTO PRODUCT (name, price) VALUES (
(bread, 1.0),
...
(butter, 2.5))
I'm not sure all DBs allow this format (seem at least Postgres, MySQL and SQlite do). This looks like kinda like a DB native way of doing prepared/parametrized insert. I suspect this would be similar in speed. Note at least in python using prepared statements is also a security feature, as it sanitizes the inputs so you can't sneak in ; where it doesn't belong. Not sure if that's part of this DBI plan. Having to write a SQL string like the above in R would be a real pain. Being able to just pass a data.frame or list of lists to a single SQL with placeholders is much simpler.
is so much faster than
INSERT INTO PRODUCT (name, price) VALUES (bread, 1.0)
...
INSERT INTO PRODUCT (name, price) VALUES (butter, 2.5)
In postgres you can compare using the psql copy command vs sending a sql file with INSERT commands.
I think this is roughly one of the things I did, but it was a long time ago. As I recall, I found several inserts was slow, but an insert with many records was about the same speed as a copy. Are you saying you would not expect that? If you don't mind answering a second question, is it possible to make a parameterised query that handles a variable number of parameters, as one might want to do for an insert with multiple records?
No, I don't think so, the way you would do this would be to have blanks in your data.frame for the values you wanted to not fill in. Others wise you would have a ragged array which is odd to handle in general. But you can't do that in a regular insert statement anyways. Number of columns listed is the number of values you have to pass for it to work.