Skip to content

OOP performance, was: V2.9.0 changes

4 messages · Thomas Petzoldt, Troy Robertson, Gabor Grothendieck

#
Hi Troy,

first of all a question, what kind of ecosystem models are you
developing in R? Differential equations or individual-based?

Your write that you are a frustrated Java developer in R. I have a
similar experience, however I still like JAVA, and I'm now more happy
with R as it is much more efficient (i.e. sum(programming + runtime))
for the things I usually do: ecological data analysis and modelling.

After using functional R quite a time and Java in parallel
I had the same idea, to make R more JAVA like and to model ecosystems in
an object oriented manner. At that time I took a look into R.oo (thanks
Henrik Bengtssson) and was one of the Co-authors of proto. I still think
that R.oo is very good and that proto is a cool idea, but finally I
switched to the recommended S4 for my ecological simulation package.

Note also, that my solution was *not* to model the ecosystems as objects
(habitat - populations- individuals), but instead to model ecological
models (equations, inputs, parameters, time steps, outputs, ...).

This works quite well with S4. A speed test (see useR!2006 poster on
http://simecol.r-forge.r-project.org/) showed that all OOP flavours had
quite comparable performance.

The only thing I have to have in mind are a few rules:

- avoid unnecessary copying of large objects. Sometimes it helps to
prefer matrices over data frames.

- use vectorization. This means for an individual-based model that one
has to re-think how to model an individual: not "many [S4] objects"
like in JAVA, but R structures (arrays, lists, data frames) where
vectorized functions (e.g. arithmetics or subset) can work with.

- avoid interpolation (i.e. approx) and if unavoidable, minimize the tables.

If all these things do not help, I write core functions in C (others use
Fortran). This can be done in a mixed style and even a full C to C
communication is possible (see the deSolve documentation how to do this
with differential equation models).


Thomas P.
#
Hi Thomas,

It is a population-based model, but I didn't develop the work.  I am just the programmer who has been given the job of coding it.  The goal is to allow for a plug and play type approach by users to construction of the model (of both elements and functionality).  Hence my focus on OO.

You are right about avoiding the copying of large objects.  That is what was killing things.  I am now working on vectorizing more of the number crunching and removing some of the nested for loops.  That should step things up a little too.

I do also need to investigate how to move some of the more expensive code to C.

Had a quick look at simecol which looks interesting.  Will point it out to my boss to check out too.

Thanks

Troy
___________________________________________________________________________

    Australian Antarctic Division - Commonwealth of Australia
IMPORTANT: This transmission is intended for the addressee only. If you are not the
intended recipient, you are notified that use or dissemination of this communication is
strictly prohibited by Commonwealth law. If you have received this transmission in error,
please notify the sender immediately by e-mail or by telephoning +61 3 6232 3209 and
DELETE the message.
        Visit our web site at http://www.antarctica.gov.au/
___________________________________________________________________________
#
In terms of performance if you want the fastest
performance in R go with S3 and if you want
even faster performance rewrite your inner loops
in C.  All the other approaches will usually be slower.
Also S3 is simple, elegant and will result in less code
and take you much less time to design, program and
debug.

For 100% R code, particularly for simulations,
proto can sometimes be even faster than pure R code based
S3 as proto supports hand optimizations that cannot readily
be done in other systems.  (For unoptimized code it would
be slower.)  The key trick is based on its ability
to separate dispatching from calling so that if method f and
object p are unchanged in a loop
   for(...) p$f(...)
then the loop can be rewritten
  f <- p$f; for(...) f(...)
Note that this still retains dynamic dispatch but
just factors it out of the loop.  With S3 the best you could
get would be for(...) f.p(...) where f is a method of class p
but this is really tantamount to not using OO at all since
no dispatch is done at all.

On Thu, Jul 2, 2009 at 11:31 AM, Thomas
Petzoldt<Thomas.Petzoldt at tu-dresden.de> wrote:
#
Hi Gabor,

Look, I agree with you about S3 and have at times wished I had chosen that path rather than S4.  It seems to do the things I struggle to find answers for with S4.  But..., knowing little about R before engaging with this project, I decided to go with the latest OO framework, S4.  I do now find that I am undoing some of it, such as the use of data member slots, in order to implement pass-by-ref via environments and improve performance.  But its all a learning experience.

Troy
___________________________________________________________________________

    Australian Antarctic Division - Commonwealth of Australia
IMPORTANT: This transmission is intended for the addressee only. If you are not the
intended recipient, you are notified that use or dissemination of this communication is
strictly prohibited by Commonwealth law. If you have received this transmission in error,
please notify the sender immediately by e-mail or by telephoning +61 3 6232 3209 and
DELETE the message.
        Visit our web site at http://www.antarctica.gov.au/
___________________________________________________________________________