Skip to content
Prev 72 / 5632 Next

[R-meta] cooks.distance.rma.mv is slow on complex models

Just a follow-up on this:

If you install the devel version of metafor (http://www.metafor-project.org/doku.php/installation#development_version), you will find a lot of improvements to cooks.distance.rma.mv(). In particular, it:

1) now can do parallel processing,
2) generally should run (quite a bit?) faster (starting values for the repeated model fits are set to the parameter estimates from the 'full' model using all data -- which are likely to be much better starting values than the default ones),
3) offers the possibility to compute approximate Cook's distance values where the variance/correlation components are not re-estimated for each model fit (reestimate=FALSE); doing so only yields an approximation to the Cook's distances that ignores the influence on the variance/correlation components, but is considerably faster (and often yields similar results), and
4) has a 'cluster' argument that allows computing Cook's distances not just for individual estimates, but for entire groups/clusters of estimates.

In Roger's analysis: Fit model 'tmp.casdiet' with 'random = ~1|laboratory/experiment/study' and then use:

cooks.distance(tmp.casdiet) ### default is Cook's distance for all estimates
cooks.distance(tmp.casdiet, cluster=tmp.dat.MTPY.new$laboratory)
cooks.distance(tmp.casdiet, cluster=tmp.dat.MTPY.new$experiment)

Not entirely sure about the last one -- it depends on how you have coded 'experiment'; if it is not unique across the levels of 'laboratory', you would want to use:

cooks.distance(tmp.casdiet, cluster=interaction(tmp.dat.MTPY.new$laboratory, tmp.dat.MTPY.new$experiment))

Best,
Wolfgang

-- 
Wolfgang Viechtbauer, Ph.D., Statistician | Department of Psychiatry and    
Neuropsychology | Maastricht University | P.O. Box 616 (VIJV1) | 6200 MD    
Maastricht, The Netherlands | +31 (43) 388-4170 | http://www.wvbauer.com    

-----Original Message-----
From: R-sig-meta-analysis [mailto:r-sig-meta-analysis-bounces at r-project.org] On Behalf Of Viechtbauer Wolfgang (SP)
Sent: Tuesday, July 25, 2017 19:54
To: r-sig-meta-analysis at r-project.org
Cc: Martineau, Roger
Subject: Re: [R-meta] cooks.distance.rma.mv is slow on complex models

Hi Roger,

In order to compute Cook's distance for the ith estimate, we have to refit the model with the ith estimate removed. For simple models like those fit by lm(), there are shortcuts that we can use to avoid actually having to refit the model. See:

https://en.wikipedia.org/wiki/Cook%27s_distance

For more complex models, those shortcuts do not exist (at least, I am not aware of them). So, right now, cooks.distance() crunches through k model fits (k being the number of estimates), which can take a considerable amount of time for complex models and large datasets. Yes, that kind of sucks.

On my to-do list is to add parallel processing to cooks.distance(). profile() and gosh() are similar functions that do repeated model fits and those already can do parallel processing. My plan is to add this to cooks.distance(). At least then, you can cut down on the time if you have a computer with multiple cores.

In the meantime, you could give Microsoft R Open (MRO) a try. There is nothing fundamentally different about MRO, except that it directly comes with Intel's Math Kernel Library (MKL) and that is likely to speed up model fitting quite a bit. See:

http://www.metafor-project.org/doku.php/tips:speeding_up_model_fitting

Alternatively, you could consider computing a sort of 'pseudo Cook's distance' based on the same idea as how Cook's distance is computed for standard linear models. In essence, it's just some kind of squared standardized residual multiplied by h/(1-h)^2, where h is the hatvalue. A simple example:

dat <- get(data(dat.bcg))
dat <- escalc(measure="RR", ai=tpos, bi=tneg, ci=cpos, di=cneg, data=dat.bcg)
res <- rma(yi, vi, data=dat)

sav1 <- cooks.distance(res)
sav2 <- rstandard(res)$z^2 * hatvalues(res) / (1-hatvalues(res))^2

plot(sav1, sav2)
cor(sav1, sav2)

Not exactly the same thing, but quite close. A more complex example:

dat <- get(data(dat.konstantopoulos2011))
res <- rma.mv(yi, vi, random = ~ 1 | district/school, data=dat)

sav1 <- cooks.distance(res)
sav2 <- rstandard(res)$z^2 * hatvalues(res) / (1-hatvalues(res))^2

plot(sav1, sav2)
cor(sav1, sav2)

Okay, now things start to not align so nicely anymore, but maybe still okay as a rough check.

Best,
Wolfgang