R, PostgresSQL and poor performance
"BD" == Berry, David <dyb at noc.ac.uk> writes:
BD> All variables are reals other than id which is varchar(10) and date BD> which is a timestamp, approximately 1.5 million rows are returned by BD> the query and it takes order 10 second to execute using psql (the BD> command line client for Postgres) and a similar time using pgAdmin BD> 3. In R it takes several minutes to run and I'm unsure where the BD> bottleneck is occurring. You may want to test progressively smaller chunks of the data to see how quickly R slows down as compared to psql on that query. My first guess is that something allocating and re-allocating ram in a quadratic (or worse) fashion. I don't know whether OSX has anything equivilent, but you could test on the linux box using oprofile (http://oprofile.sourceforge.net; SuSE should have an rpm for it and kernel support compiled in) to confirm where the time is spent. It is /possible/ that the (sql)NULL->(r)NA logic in RS-PostgreSQL.c may be slow (relatively speaking), but it is necessary. Nothing else jumps out as a possible choke point. Oprofile (or the equivilent) would best answer the question. -JimC
James Cloos <cloos at jhcloos.com> OpenPGP: 1024D/ED7DAEA6