Parameterised queries
On 15-02-13 08:49 PM, Tim Keitt wrote:
On Fri, Feb 13, 2015 at 5:05 PM, Alex Mandel <tech_dev at wildintellect.com
<mailto:tech_dev at wildintellect.com>> wrote:
On 02/13/2015 02:53 PM, Paul Gilbert wrote:
> (Possibly naive question.)
>
>> The reason inserts are slow...
>
> I've now seen this stated a few times on this thread and I don't fully
> understand the context everyone seems to be assuming. With a Perl client
> I can insert multiple records with a single insert command, over a
> network, and it seems to be nearly as fast as loading with the batch
> utility locally on the server. As I recall, I had to block less that
> 10,000 records or something choked, but multiple blocks loaded a fairly
> large database reasonably quickly. (I'm travelling and cannot verify
> this number at the moment.) It is import to load before specifying
> indexes, otherwise the index gets recalculated with each insert.
>
> When people say inserts are slow:
>
> -Is this an R specific problem?
> -Is it assumed indexes are already specified?
> -Or am I missing something else once again?
>
> Thanks,
> Paul
>
It's not exactly R specific, though we are discussing the implementation
in R.
If you bulk load generally you want it to be an atomic commit (all
records or none), and you want it to hold of on doing triggers and
indexes until after the data has made it to the db. Different Dbs have
different performance levels.
In postgres you can compare using the psql copy command vs sending a sql
file with INSERT commands.
The example for "copy_to" in rpg compares pushing the hflights db to
postgresql using inserts and copy. You'll notice the insert version
('write_table') is commented out because it takes 5 minutes or so and I
got tired of waiting. Using 'copy_to' is less than 5 seconds. No indices
or constraints in operation.
I don't have the tools to test anything at the moment, but the rpg note
under write_tables says "write_table uses SQL INSERT statements".
Just to be clear, I know
INSERT INTO PRODUCT (name, price) VALUES (bread, 1.0)
...
INSERT INTO PRODUCT (name, price) VALUES (butter, 2.5)
is slow, it is
INSERT INTO PRODUCT (name, price) VALUES (
(bread, 1.0),
...
(butter, 2.5))
that I think is fairly fast. (I might have the syntax wrong.)
Paul
More background
http://en.wikipedia.org/wiki/Prepared_statement
http://rosettacode.org/wiki/Parametrized_SQL_statement
Not knowing how you wrote your Perl code, you may have been using
prepared statements. In which case both of your runs remote or local
should of had similar performance. What's being discussed is the
introduction of prepared statements which has not existed for most Db
connectors in R previously.
Having done bulk loads in the 10 GB+ range before I can tell you it
matters a lot.
I've experimented with using parallel R to run many inserts in parallel.
Still have not gotten it worked out and I'm not sure it will help in any
event owing to write contention. Would be interesting to have a good
test case.
THK
Thanks,
Alex
_______________________________________________
R-sig-DB mailing list -- R Special Interest Group
R-sig-DB at r-project.org <mailto:R-sig-DB at r-project.org>
https://stat.ethz.ch/mailman/listinfo/r-sig-db
--
http://www.keittlab.org/