Skip to content
Prev 1414 / 1559 Next

SQL speed vs R

The place where I have seen a large difference is when you have to "join" a
large number of tables to get results.  The problem with doing is R is the
restriction of how much can fit into memory and the time that it takes to
download the data across a data connection if the data is remote from the
machine that you are doing R on.  As an aside, we have a number of stored
procedures on the SQL machine that will call R locally to do some of the
processing.  You have to look at the tradeoff of the time to get the data
vs. the time needed to aggregate the results.  An SQL server is much better
at doing joins across ten different tables and being able to scale that
across multiple processors if they are available.  The resulting table is
typically much smaller when it comes time to download it to your local R
machine.  There is a big difference is these tables start to exceed the
memory that you have on your local R machine.

I would agree that if I can fit the data into memory, there are packages
(data.table & dplyr) that will make the accessing/aggregation of data
pretty fast, but if I can do some of that on a SQL machine, the amount of
data that I have to download and locally process may be a lot smaller.


Jim Holtman
Data Munger Guru

What is the problem that you are trying to solve?
Tell me what you want to do, not how you want to do it.
On Wed, May 7, 2014 at 10:08 AM, Tim Keitt <tkeitt at utexas.edu> wrote: