Multi Processor / lme4
On 14-04-27 02:34 PM, Doogan, Nathan wrote:
Thanks for the info, Ben. It sounds like the best way for me to speed things up at the moment, then, is to build R with a threaded linear algebra library. -Nate
Actually, I don't know if that will help; most of the difficult linear algebra in lme4 is sparse linear algebra, handled through the Matrix package (wrapping Tim Davis's SuiteSparse library) and the RcppEigen package. I'm not sure how much of it really uses the standard BLAS back-end. Someone with time on their hands could do some benchmarking and see what happens. http://cran.r-project.org/web/packages/gcbd/vignettes/gcbd.pdf? is a good resource. If you're really interested in performance, and feeling adventurous, I would definitely recommend Doug Bates's MixedModels package for Julia. It is also the case at the moment that lme4 is slower than lme4.0 for some problems. If you have specific performance questions (rather than just "it would be nice for lme4 to be faster", which I don't disagree with), it would be good to give the parameters of your problem -- how many observations, grouping variables, # of levels of grouping variables, LMM vs GLMM, structure of grouping variables (nested, crossed, partially crossed) ... ? Ben Bolker
p.s. a duplicate of this message (sent from a non-member email address) is waiting for moderation. feel free to delete. On Sun, Apr 27, 2014 at 2:00 PM, Ben Bolker <bbolker at gmail.com> wrote:
[This is a perfectly reasonable question for r-sig-mixed-models, so I'm forwarding it there] =================== This might be very naive. I assume a very costly part of estimating parameters is evaluating the likelihood -- particularly when the data are large. It seems like it'd be fairly easy to distribute that evaluation across several processes (i.e., multithread it) to speed up the procedure (e.g., mclapply or pvec in R "parallel" package) That said, I'd guess likelihood evaluation in (g)lmer actually happens somewhere else where I am not so comfortable. Any obvious reasons this map-reduce (I think it's called) sort of technique is not in use? Thanks for your time. -Nate -- Nathan Doogan, Ph.D. Post Doctoral Researcher The Colleges of Social Work and Public Health The Ohio State University It is indeed a little naive, but not silly at all. The problem is that it is *not* "fairly easy" to distribute the evaluation across processors via map-reduce/mclapply etc.. Doug Bates has already done all kind of wizardry to reduce the likelihood evaluation to linear algebra operations that can be done very efficiently. While speeding the process up enormously been doing some careful profiling with his Julia code, and the largest single cost (if I am recalling things correctly) is computing a sparse Cholesky decomposition. He has been looking *very* recently at the PaStiX library <http://http://pastix.gforge.inria.fr/> as a possible way to parallelize this operation, but it is not completely trivial. cheers Ben Bolker
[[alternative HTML version deleted]]
_______________________________________________ R-sig-mixed-models at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models