Skip to content
Prev 134348 / 398498 Next

MySQL + R as a Replacement for SAS Proc SQL + Various Stat Procs

On Fri, 18 Jan 2008, JWilliamson at lecg.com wrote:

            
The way these work (using RODBC is an example) is

- If necessary, send the data to MySQL via sqlSave().

- Use sqlQuery() to send SQL statements verbatim to the RDBMS (here MySQL)

- Retrieve a table via sqlFetch().

- Do the analysis on the fetched table.

If the table is very large, you can fetch in junks and use the facilities 
in the 'biglm' package to do a regression a block of data at a time.
However, I am not sure of the value of using more than 10,000 cases in a 
regression, as well before that non-sampling errors will dominate the 
error distribution: e.g. the systematic error from model misfit may be 
larger than the nominal standard errors.

I can see why experienced SAS users like to use it for data cleanup, but 
it seem generally true that the user is a more important variable than the 
tool: people work best with the tools they understand best (and personal 
preference comes into it).
The way I am sketching above is using R as the scripting language. It's a 
pretty powerful one, certainly powerful enuough to do the text processing 
needed to prepare SQL queries.