An embedded and charset-unspecified text was scrubbed... Name: not available URL: <https://stat.ethz.ch/pipermail/r-sig-mixed-models/attachments/20100524/09a13d50/attachment.pl>
Improving computation time for a binary outcome in lme4
3 messages · Robin Jeffries, Douglas Bates, Adam D. I. Kramer
On Mon, May 24, 2010 at 7:46 PM, Robin Jeffries <rjeffries at ucla.edu> wrote:
I am running a mixed effects model with two random effects that have ~500 and ~1400 factor levels respectively.
For a continuous outcome, the computation time using lme4 is workable. However for a binary outcome the computation time increases 4-80 fold compared to a similar model for a continuous outcome. I tend to stop computations if they've been running more than 8 hours, so I don't have a max time estimate)
There are at least two characteristics of the generalized linear mixed model that are causing the increase in computational time. The first is the fact that the algorithm is based on iteratively reweighted least squares (IRLS) and not ordinary least squares (OLS). It is inevitable that an iterative algorithm is slower than a direct calculation. The second cause is the fact that one can "profile out" the fixed-effects parameters in a linear mixed-effects model but not in a generalized linear mixed-effects model. You can fake it to some extent but the currently released version of the lme4 package doesn't. Thus, the greater the number of fixed-effects parameters, the greater the complexity of the problem. If you use the verbose option to lmer and to glmer on similar problems you will see that lmer if optimizing over fewer parameters than is glmer.
At least one of the fixed effects is also a 6-level factor. I attempted to treat this as a sparse matrix, but lmer() doesn't seem to allow for this type of matrix in the model.
As I mentioned in my reply on R-help, the development version of the lme4 package does have a sparseX option. For a factor with 6 levels it is unlikely that it will help. The sparsity index of the X matrix will be greater than 1/6 and that is close to the breakpoint where dense methods, which do more numerical computation but less structural analysis, are actually faster than sparse methods.
Are there any suggestions on what I can do (other than simplify the model) to improve the computation time for a binary outcome?
There are the usual suspects of getting access to a fast computer with lots of memory and a 64-bit operating system. You could see whether an accelerated BLAS will help. For example, Revolution R has the MKL BLAS built-in. Regrettably, that isn't always a speed boost. We have seen situations where multi-threaded BLAS actually slow down sparse matrix operations because the communications overhead is greater than the time savings of being able to perform more flops per second.
Also, could people comment on the speed of MCMCglmm vs lme4? Perhaps I could go this route if it will prove to be substantially quicker for a binary outcome.
Thank you to Douglas Bates for suggesting I post here. I think i'll be able to find more help using lme4 here than on the normal R-help. ~~~~~~~~~~~~~~~~~~~ -Robin Jeffries Dr.P.H. Candidate in Biostatistics UCLA School of Public Health rjeffries at ucla.edu 530-624-0428 ? ? ? ?[[alternative HTML version deleted]]
_______________________________________________ R-sig-mixed-models at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
On Mon, 24 May 2010, Douglas Bates wrote:
If you use the verbose option to lmer and to glmer on similar problems you will see that lmer if optimizing over fewer parameters than is glmer.
Just one potentially useful observation: Turning on "verbose" makes the waiting period much MUCH more tolerable. It's kinda like a progress bar--you know glmer is doing something and that makes it easier to wait. For some huge models with bigger-than-I-needed data sets (back in the netflix prize days), I just let R run overnight and got what I wanted--but I had never let it go more than an hour before I worried that it was looping.
Are there any suggestions on what I can do (other than simplify the model) to improve the computation time for a binary outcome?
There are the usual suspects of getting access to a fast computer with lots of memory and a 64-bit operating system. You could see whether an accelerated BLAS will help. For example, Revolution R has the MKL BLAS built-in. Regrettably, that isn't always a speed boost. We have seen situations where multi-threaded BLAS actually slow down sparse matrix operations because the communications overhead is greater than the time savings of being able to perform more flops per second.
My kingdom for multi-threaded nlm()... --Adam