MySql Versus R
On Fri, 1 Apr 2011, Henri Mone wrote:
Dear R Users, I use for my data crunching a combination of MySQL and GNU R. I have to handle huge/ middle seized data which is stored in a MySql database, R executes a SQL command to fetch the data and does the plotting with the build in R plotting functions. The (low level) calculations like summing, dividing, grouping, sorting etc. can be done either with the sql command on the MySQL side or on the R side. My question is what is faster for this low level calculations / data rearrangement MySQL or R? Is there a general rule of thumb what to shift to the MySql side and what to the R side?
The data transfer costs almost always dominate here: since such low-level computations would almost always be a trivial part of the total costs, you should do things which can reduce the size (e.g. summarizations) in the DBMS. I do wonder what you think the R-sig-db list is for if not questions such as this one. Please subscribe and use it next time.
Thanks Henri
Brian D. Ripley, ripley at stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595