Parameterised queries
On 02/13/2015 02:53 PM, Paul Gilbert wrote:
(Possibly naive question.)
The reason inserts are slow...
I've now seen this stated a few times on this thread and I don't fully understand the context everyone seems to be assuming. With a Perl client I can insert multiple records with a single insert command, over a network, and it seems to be nearly as fast as loading with the batch utility locally on the server. As I recall, I had to block less that 10,000 records or something choked, but multiple blocks loaded a fairly large database reasonably quickly. (I'm travelling and cannot verify this number at the moment.) It is import to load before specifying indexes, otherwise the index gets recalculated with each insert. When people say inserts are slow: -Is this an R specific problem? -Is it assumed indexes are already specified? -Or am I missing something else once again? Thanks, Paul
It's not exactly R specific, though we are discussing the implementation in R. If you bulk load generally you want it to be an atomic commit (all records or none), and you want it to hold of on doing triggers and indexes until after the data has made it to the db. Different Dbs have different performance levels. In postgres you can compare using the psql copy command vs sending a sql file with INSERT commands. More background http://en.wikipedia.org/wiki/Prepared_statement http://rosettacode.org/wiki/Parametrized_SQL_statement Not knowing how you wrote your Perl code, you may have been using prepared statements. In which case both of your runs remote or local should of had similar performance. What's being discussed is the introduction of prepared statements which has not existed for most Db connectors in R previously. Having done bulk loads in the 10 GB+ range before I can tell you it matters a lot. Thanks, Alex