Skip to content
Prev 207883 / 398502 Next

RMySQL - Bulk loading data and creating FK links

I think one would only be concerned about such internals if one were
primarily interested in performance; otherwise, one would be more
interested in ease of specification and part of that ease is having it
independent of implementation and separating implementation from
specification activities.  An example of separation of specification
and implementation is that by simply specifying a disk-based database
rather than an in-memory database SQL can perform queries that take
more space than memory.  The query itself need not be modified.

I think the viewpoint you are discussing is primarily one of
performance whereas the viewpoint I was discussing is primarily ease
of use and that accounts for the difference.

I believe your performance comparison is comparing a sequence of
operations that include building a database, transferring data to it,
performing the operation, reading it back in and destroying the
database to an internal manipulation.  I would expect the internal
manipulation, particular one done primarily in C code as is the case
with data.table, to be faster although some benchmarks of the database
approach found that it compared surprisingly well to straight R code
-- some users of sqldf found that for an 8000 row data frame sqldf
actually ran faster than aggregate and also faster than tapply.  The
News section on the sqldf home page provides links to their
benchmarks.  Thus if R is fast enough then its likely that the
database approach is fast enough too since its even faster.
On Thu, Jan 28, 2010 at 8:52 AM, Matthew Dowle <mdowle at mdowle.plus.com> wrote: