Hi all, I'm using the latest RMySQL to send requests to a MySQL database (which is quite small for the time - let's say 20 tables with a total of 1000 rows), and this should be part of a whole decisionnal system. It produces results which are displayed on a website, and therefore I need fast results (the database being quite small, I think this must be possible !). Unfortunately, a simple request to the database, and R needs more than 10 seconds. Even dbListTables() takes a long while. Imagine the problem with more data ! Can I hope something better with RMySQL ? Did anyone have the same problem ? If RMySQL (or is it MySQL's fault ?) is so slow, I could replace MySQL with Postgres, but this would imply a lot of extra work. R is perhaps very slow anyway ? Any piece of advice ? Thanks. Laurent
my RMySQL connections are so SLOW
7 messages · Laurent Faisnel, Ernesto Jardim, Brian Ripley
What exactly are you timing? For me running a query is instant. Are you starting R for each query, for example? What OS and what hardware are you using? People have reported that RODBC is a lot faster than RMySQL for some tasks involving large datasets.
On Mon, 5 May 2003, Laurent Faisnel wrote:
I'm using the latest RMySQL to send requests to a MySQL database (which is quite small for the time - let's say 20 tables with a total of 1000 rows), and this should be part of a whole decisionnal system. It produces results which are displayed on a website, and therefore I need fast results (the database being quite small, I think this must be possible !). Unfortunately, a simple request to the database, and R needs more than 10 seconds. Even dbListTables() takes a long while. Imagine the problem with more data ! Can I hope something better with RMySQL ? Did anyone have the same problem ? If RMySQL (or is it MySQL's fault ?) is so slow, I could replace MySQL with Postgres, but this would imply a lot of extra work. R is perhaps very slow anyway ? Any piece of advice ?
Brian D. Ripley, ripley at stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595
Thank you for your fast answer. I work on a 450MHz Pentium II with 192
MBytes RAM.
dbListTables() takes about 3 seconds. For direct use it's OK, but when
requests become included in programs, it makes very slow programs !
I had to calculate an average score for each value of a column (i=1:12)
to fill an array, and so the operation (pretty simple) took more than 30
seconds (almost 3 seconds per "select avg(score)..." request).
I don't think my dataset is huge enough to make it worth using ODBC,
which would cause other problems. I do not start R for each query. I
have the same slowness when sending the requests to R directly (without
using source code).
What makes me hope for a solution is when you say "For me running a
query is instant". What could be the difference between our systems ?
The CPU speed does not explain all, in my opinion.
Laurent
/
> version
_
platform i686-pc-linux-gnu
arch i686
os linux-gnu
system i686, linux-gnu
status
major 1
minor 6.2
year 2003
month 01
day 10
language R/
Prof Brian Ripley wrote:
What exactly are you timing? For me running a query is instant. Are you starting R for each query, for example? What OS and what hardware are you using? People have reported that RODBC is a lot faster than RMySQL for some tasks involving large datasets. On Mon, 5 May 2003, Laurent Faisnel wrote:
I'm using the latest RMySQL to send requests to a MySQL database (which is quite small for the time - let's say 20 tables with a total of 1000 rows), and this should be part of a whole decisionnal system. It produces results which are displayed on a website, and therefore I need fast results (the database being quite small, I think this must be possible !). Unfortunately, a simple request to the database, and R needs more than 10 seconds. Even dbListTables() takes a long while. Imagine the problem with more data ! Can I hope something better with RMySQL ? Did anyone have the same problem ? If RMySQL (or is it MySQL's fault ?) is so slow, I could replace MySQL with Postgres, but this would imply a lot of extra work. R is perhaps very slow anyway ? Any piece of advice ?
On Mon, 2003-05-05 at 09:56, Laurent Faisnel wrote:
Hi all, I'm using the latest RMySQL to send requests to a MySQL database (which is quite small for the time - let's say 20 tables with a total of 1000 rows), and this should be part of a whole decisionnal system. It produces results which are displayed on a website, and therefore I need fast results (the database being quite small, I think this must be possible !). Unfortunately, a simple request to the database, and R needs more than 10 seconds. Even dbListTables() takes a long while. Imagine the problem with more data ! Can I hope something better with RMySQL ? Did anyone have the same problem ? If RMySQL (or is it MySQL's fault ?) is so slow, I could replace MySQL with Postgres, but this would imply a lot of extra work. R is perhaps very slow anyway ? Any piece of advice ? Thanks. Laurent
______________________________________________ R-help at stat.math.ethz.ch mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help
Hi Have you tried to run the sql statement in mysql to check if the problem is really in RMySQL ? MySQL databases can be very slow if you don't index the tables. Check http://www.mysql.com/documentation/mysql/bychapter/manual_MySQL_Optimisation.html#MySQL_indexes Regards EJ
On Mon, 5 May 2003, Laurent Faisnel wrote:
Thank you for your fast answer. I work on a 450MHz Pentium II with 192 MBytes RAM.
And Linux. That's a rather old spec -- 4-5 years?
dbListTables() takes about 3 seconds. For direct use it's OK, but when requests become included in programs, it makes very slow programs ! I had to calculate an average score for each value of a column (i=1:12) to fill an array, and so the operation (pretty simple) took more than 30 seconds (almost 3 seconds per "select avg(score)..." request). I don't think my dataset is huge enough to make it worth using ODBC, which would cause other problems. I do not start R for each query. I have the same slowness when sending the requests to R directly (without using source code). What makes me hope for a solution is when you say "For me running a query is instant". What could be the difference between our systems ? The CPU speed does not explain all, in my opinion.
Well, I am using a dual Athlon 2600, 1Gb RAM and a fast local disc system, so that is a considerable difference. Maybe you need more RAM.
Brian D. Ripley, ripley at stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595
Prof Brian Ripley wrote:
On Mon, 5 May 2003, Laurent Faisnel wrote:
Thank you for your fast answer. I work on a 450MHz Pentium II with 192 MBytes RAM.
And Linux. That's a rather old spec -- 4-5 years?
dbListTables() takes about 3 seconds. For direct use it's OK, but when requests become included in programs, it makes very slow programs ! I had to calculate an average score for each value of a column (i=1:12) to fill an array, and so the operation (pretty simple) took more than 30 seconds (almost 3 seconds per "select avg(score)..." request). I don't think my dataset is huge enough to make it worth using ODBC, which would cause other problems. I do not start R for each query. I have the same slowness when sending the requests to R directly (without using source code). What makes me hope for a solution is when you say "For me running a query is instant". What could be the difference between our systems ? The CPU speed does not explain all, in my opinion.
Well, I am using a dual Athlon 2600, 1Gb RAM and a fast local disc system, so that is a considerable difference. Maybe you need more RAM.
OK. Clearly, I'm disadvantaged by my slow machine ! But there is another
element I have to give you to help you understand my scepticism. R is
called by PHP (which manages the web pages that display amongst others
statistical results). The first version of our system did not use R.
Indeed PHP can access the database, and it did the work fine (and fast ! ).
Now I want to use R for a better analysis (in a in evolutive view), but
I'm very disappointed to note how slow requests are treated. The problem
cannot be at MySQL's level since it was so fast with PHP (about 100x
faster I think). So I think there must be a problem with R/RMySQL.
However, isn't there fast C code in RMySQL ?
Perhaps I do not use the best commands for the requests :
/# beginning of R program (once only)
library(DBI);
library(RMySQL);
con <- dbConnect("MySQL");
# typical request
dbGetQuery(con,"select avg(mark) from table1 where critere=1");
# this takes 2 sec. (instant with PHP & MySQL)
# table1 being a table without index, with 3 columns and only 500 rows/
By the way, how should I close ResultSets to clean the workspace (I
tried dbSendQuery instead of dbGetQuery) ? Should I give an object name
to dbSendQuery's result (res) and then delete it (how ? seems to be a
problem with dbClearResult) ?
Thanks again,
Laurent
On Mon, 2003-05-05 at 13:22, Laurent Faisnel wrote:
Prof Brian Ripley wrote:
On Mon, 5 May 2003, Laurent Faisnel wrote:
Thank you for your fast answer. I work on a 450MHz Pentium II with 192 MBytes RAM.
And Linux. That's a rather old spec -- 4-5 years?
dbListTables() takes about 3 seconds. For direct use it's OK, but when requests become included in programs, it makes very slow programs ! I had to calculate an average score for each value of a column (i=1:12) to fill an array, and so the operation (pretty simple) took more than 30 seconds (almost 3 seconds per "select avg(score)..." request). I don't think my dataset is huge enough to make it worth using ODBC, which would cause other problems. I do not start R for each query. I have the same slowness when sending the requests to R directly (without using source code). What makes me hope for a solution is when you say "For me running a query is instant". What could be the difference between our systems ? The CPU speed does not explain all, in my opinion.
Well, I am using a dual Athlon 2600, 1Gb RAM and a fast local disc system, so that is a considerable difference. Maybe you need more RAM.
OK. Clearly, I'm disadvantaged by my slow machine ! But there is another
element I have to give you to help you understand my scepticism. R is
called by PHP (which manages the web pages that display amongst others
statistical results). The first version of our system did not use R.
Indeed PHP can access the database, and it did the work fine (and fast ! ).
Now I want to use R for a better analysis (in a in evolutive view), but
I'm very disappointed to note how slow requests are treated. The problem
cannot be at MySQL's level since it was so fast with PHP (about 100x
faster I think). So I think there must be a problem with R/RMySQL.
However, isn't there fast C code in RMySQL ?
Perhaps I do not use the best commands for the requests :
/# beginning of R program (once only)
library(DBI);
library(RMySQL);
con <- dbConnect("MySQL");
# typical request
dbGetQuery(con,"select avg(mark) from table1 where critere=1");
# this takes 2 sec. (instant with PHP & MySQL)
# table1 being a table without index, with 3 columns and only 500 rows/
By the way, how should I close ResultSets to clean the workspace (I
tried dbSendQuery instead of dbGetQuery) ? Should I give an object name
to dbSendQuery's result (res) and then delete it (how ? seems to be a
problem with dbClearResult) ?
Thanks again,
Laurent
______________________________________________ R-help at stat.math.ethz.ch mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help
Hi Have you tried to run the commands inside R whithout php calling R ? Was it slow ? Why are you making an average inside SQL if you're using R ? Should it be better to select the dataset you want to work with and than make the analysis inside R ? Once again in my opinion you'll win a lot of speed if you index your tables. Regards EJ