More naive questions: Speed comparisons? what is a "stack imbalance" in lmer? does lmer center variables?

Dear collagues,

 	Questions of speed, especially comparative speed between R and some
proprietary program, come up with some frequency.  Perhaps we should add a
FAQ that covers this?  Here is my first attempt at such a contribution.

Section: R Speed. I have {heard,seen,shown} that R is {a bit,a lot,eons}
faster than [other program].  Is this true?

0) We assume that the "benchmark" data sets adequately represent the sorts
of tasks you might use the software for, and that it includes the whole
process (including data import/transformation to prepare the model to run).

1) Proprietary software produces unverifiable statistical results: You
cannot deduce or detect when, whether, or how your results are wrong, unless
you compare the results to a trustworthy algorithm.  It is always faster to
use one method (e.g., the trustworthy one) than two.

2) Paying skilled programmers to write, optimize, and compile statistical
algorthims optimizes for speed.  Letting skilled statisticians who know how
to code and are personally motivated to implement an algorithm optimizes for
correctness and robustness of a model and its results.

3) The extensible nature of R algorithms may require more processing time,
which is likely less than the time required to export, import, reformat, and
re-tidy a data set for several programs (HLM6, SPSS, SAS, etc.), much like
it is faster to visit a large department store to buy a frying pan, perfume,
and a suit than it is to drive across three times to specific stores.

4) The speed of R changes based on several factors, including the version of
R you use and in some cases the BLAS against which you link.  If you believe
R is much slower than it should be, be sure you have upgraded to the latest
version and are using the appropriate version for your hardware and software
(e.g., 64-bit R for Snow Leopard).

5) R's code is written to treat arbitrarily-large data sets; many "very
fast" algorithms that other programs could use may well make assumptions
about data size that are intractible for large data sets, leading to
crashes, errors, or results that seem correct but are not.

Further:

6) R is more than a one-trick pony, and the ability to do nearly everything
comes at the expense of microoptimization.  Human triatheletes also do not
and cannot bike, run, or swim as quickly as single-focus bicyclists,
runners, or swimmers.

7) It is probably unreasonable to even expect open-source interpreted code
to be as fast as code written by people whose full-time job is to write and
optimize code. The fact that R's speed is of the same order of magnitude as
"some other program" is itself remarkable.

...in sum, it is possible that a proprietary software system may produce a
result of the form you expect faster than R does, but it is unlikely to be
MUCH faster when you consider the entire data analysis process, and there
are several strong arguments for waiting the few extra seconds.

--Adam

More naive questions: Speed comparisons? what is a "stack imbalance" in lmer? does lmer center variables?

Thread (5 messages)