Speed up code, profiling, optimization, lapply vs. loops
That's a good point---I've found that skipping a lot of the setup that 'glm' does and calling 'glm.fit' directly can save a lot of time. -roger On Tue, Jul 7, 2009 at 12:53 AM, Kasper Daniel
Hansen<khansen at stat.berkeley.edu> wrote:
Aside from the advice from other people, you seem to be doing many glm calls. A big part of a call to a model function involves setting up the design matrix, check for missing values etc. If I understand you description correctly you may only need to do this once. This will require some poking around in glm, but might save you a lot of time. Kasper On Jul 6, 2009, at 1:26 , Thorn Thaler wrote:
High everybody, currently I'm writinig a package that, for a given family of variance functions depending on a parameter theta, say, computes the extended quasi likelihood (eql) function for different values of theta. The computation involves a couple of calls of the 'glm' routine. What I'm doing now is to call 'lapply' for a list of theta values and a function, that constructs a family object for the particular choice of theta, computes the glm and uses the results to get the eql. Not surprisingly the function is not very fast. Depending on the size of the parameter space under consideration it takes a couple of minutes until the function finishes. Testing ~1000 Parameters takes about 5 minutes on my machine. I know that loops in R are slow more often than not. Thus, I thought using 'lapply' is a better way. But anyways, it is just another way of a loop. Besides, it involves some overhead for the function call and hence i'm not sure wheter using 'lapply' is really the better choice. What I like to know is to figure out, where the bottleneck lies. Vectorization would help, but since I don't think that there is vectorized 'glm' function, which is able to handle a vector of family objects. I'm not aware if there is any choice aside from using a loop. So my questions: - how can I figure out where the bottleneck lies? - is 'lapply' always superior to a loop in terms of execution time? - are there any 'evil' commands that should be avoided in a loop, for they slow down the computation? - are there any good books, tutorials about how to profile R code efficiently? TIA 4 ur help, Thorn
______________________________________________ R-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
______________________________________________ R-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Roger D. Peng | http://www.biostat.jhsph.edu/~rpeng/