Skip to content

my RMySQL connections are so SLOW

7 messages · Laurent Faisnel, Ernesto Jardim, Brian Ripley

#
Hi all,
I'm using the latest RMySQL to send requests to a MySQL database (which 
is quite small for the time - let's say 20 tables with a total of 1000 
rows), and this should be part of a whole decisionnal system. It 
produces results which are displayed on a website, and therefore I need 
fast results (the database being quite small, I think this must be 
possible !).
Unfortunately, a simple request to the database, and R needs more than 
10 seconds. Even dbListTables() takes a long while. Imagine the problem 
with more data ! Can I hope something better  with RMySQL ? Did anyone 
have the same problem ?
If RMySQL (or is it MySQL's fault ?) is so slow, I could replace MySQL 
with Postgres, but this would imply a lot of extra work. R is perhaps 
very slow anyway ? Any piece of advice ?
Thanks.
Laurent
#
What exactly are you timing?  For me running a query is instant.
Are you starting R for each query, for example?  What OS and what hardware 
are you using?

People have reported that RODBC is a lot faster than RMySQL for some tasks 
involving large datasets.
On Mon, 5 May 2003, Laurent Faisnel wrote:

            

  
    
#
Thank you for your fast answer. I work on a 450MHz Pentium II with 192 
MBytes RAM.
dbListTables() takes about 3 seconds. For direct use it's OK, but when 
requests become included in programs, it makes very slow programs !
 I had to calculate an average score for each value of a column (i=1:12) 
to fill an array, and so the operation (pretty simple) took more than 30 
seconds (almost 3 seconds per "select avg(score)..." request).

I don't think my dataset is huge enough to make it worth using ODBC, 
which would cause other problems. I do not start R for each query. I 
have the same slowness when sending the requests to R directly (without 
using source code).

What makes me hope for a solution is when you say "For me running a 
query is instant". What could be the difference between our systems ? 
The CPU speed does not explain all, in my opinion.

Laurent


/
 > version
         _
platform i686-pc-linux-gnu
arch     i686
os       linux-gnu
system   i686, linux-gnu
status
major    1
minor    6.2
year     2003
month    01
day      10
language R/
Prof Brian Ripley wrote:

            
#
On Mon, 2003-05-05 at 09:56, Laurent Faisnel wrote:
Hi

Have you tried to run the sql statement in mysql to check if the problem
is really in RMySQL ? MySQL databases can be very slow if you don't
index the tables. 

Check
http://www.mysql.com/documentation/mysql/bychapter/manual_MySQL_Optimisation.html#MySQL_indexes

Regards

EJ
#
On Mon, 5 May 2003, Laurent Faisnel wrote:

            
And Linux.  That's a rather old spec --  4-5 years?
Well, I am using a dual Athlon 2600, 1Gb RAM and a fast local disc system,
so that is a considerable difference.  Maybe you need more RAM.
#
Prof Brian Ripley wrote:

            
OK. Clearly, I'm disadvantaged by my slow machine ! But there is another 
element I have to give you to help you understand my scepticism. R is 
called by PHP (which manages the web pages that  display amongst others 
statistical results). The first version of our system did not use R. 
Indeed PHP can access the database, and it did the work fine (and fast ! ).
Now I want to use R for a better analysis (in a in evolutive view), but 
I'm very disappointed to note how slow requests are treated. The problem 
cannot be at MySQL's level since it was so fast with PHP (about 100x 
faster I think). So I think there must be a problem with R/RMySQL. 
However, isn't there fast C code in RMySQL ?

Perhaps I do not use the best commands for the requests :

/# beginning of R program (once only)
library(DBI);
library(RMySQL);
con <- dbConnect("MySQL");

# typical request
dbGetQuery(con,"select avg(mark) from table1 where critere=1");
# this takes 2 sec. (instant with PHP & MySQL)
# table1 being a table without index, with 3 columns and only 500 rows/

By the way, how should I close ResultSets to clean the workspace (I 
tried dbSendQuery instead of dbGetQuery) ? Should I give an object name 
to dbSendQuery's result (res) and then delete it (how ? seems to be a 
problem with dbClearResult) ?

Thanks again,
Laurent
#
On Mon, 2003-05-05 at 13:22, Laurent Faisnel wrote:
Hi

Have you tried to run the commands inside R whithout php calling R ? Was
it slow ? Why are you making an average inside SQL if you're using R ?
Should it be better to select the dataset you want to work with and than
make the analysis inside R ?

Once again in my opinion you'll win a lot of speed if you index your
tables.

Regards

EJ