Skip to content

I'm sorry, and here is what I mean to ask about speed

1 message · Steven McKinney

#
Hi Paul,

Comments on speed in-line below
...
Here's Doug's comment:
<Doug Bates>
But you don't need to speculate about what lmer does.  It is Open
Source so you can check for yourself.

However, this does bring up another point which is the need to compare
apples with apples when you are benchmarking software.  If the data
import and model specification stages in HLM6 create the necessary
matrix structures for performing the iterative fit then does the time
to fit the model consist solely of the optimization and summary
stages?  Using an expression like

system.time(fm1 <- lmer(...))

is assessing the time to take the original data, which could be in a
very general form, create all those internal structures and perform
the optimization.

You should bear in mind that a lot of that construction of the model
structures is written in R exactly so that it is capable of fitting
very general model specifications.  The code in HLM6 is, I imagine,
compiled code, which is possible because it targets a very specific
task, and compiled code is always going to be much faster than
interpreted code.
<\Doug Bates>

So part of the speed difference will be that R is an interpreted
language, whereas HLM6 is compiled.  

The other part is the construction and handling of the model matrix, 
which is a tough one to compare, as lmer() can handle more general models.  

Will your colleague only be fitting models that are within the 
specifications of HLM6, or will your colleague have some datasets with 
structure that HLM6 can not handle, and so will need to shoe-horn the data 
into HLM6 and make compromises that would not need to be made in lmer()?

If the former, then some clever programming (potentially both in R and
in C) can yield a specialized version of lmer() that will be comparable
in speed to HLM6 (I've done such modifications to several functions
over the years so have stopped believing that compiled code is always
faster than R - after all much of R is compiled C).  

If the latter, then the flexibility of the interpreted language version, 
and the implementation speed (i.e versus recoding and recompiling a 
specialized C program to fit new scenarios) generally beats the compiled 
language version.
When I need to fit hundreds or thousands of models, I overcome the
speed deficit of the interpreted language by using a compute cluster,
far cheaper than the cost of my or other programmers' time that would
be involved to code and compile some specialty software in an effort
to handle the great variety of problems that the interpreted language
can handle.


All that said, the learning curve for the S language is somewhat
steep and a bit long.  If you just have to stuff some data into
something right away and get some numbers out, the $500 or so to
purchase HLM6 may be cheaper than learning R.  But if you're in it
for the long haul, learning how to drive this Race caR is sweet.


Steven McKinney, Ph.D.

Statistician
Molecular Oncology and Breast Cancer Program
British Columbia Cancer Research Centre

email: smckinney -at-  bccrc +dot+ ca
tel: 604-675-8000 x7561

BCCRC
Molecular Oncology
675 West 10th Ave, Floor 4
Vancouver B.C.
V5Z 1L3

Canada