[R-meta] Influential case diagnostics in a multivariate multilevel meta-analysis in metafor

Thu, Jan 17, 2019 2:16 PM

Hi Yogev,

Just to be safe, make sure you are using the latest 'devel' version of metafor. Run devtools::install_github("wviechtb/metafor") to be sure. Also, I would go with whatever detectCores(logical=FALSE) tells you for the number of cores. But even without that, things should finish in a few minutes. Beyond that, I really don't know what the issue could be. It certainly isn't an issue with metafor per se.

Best,
Wolfgang

-----Original Message-----
From: Yogev Kivity [mailto:yogev_k at yahoo.com] 
Sent: Thursday, 17 January, 2019 21:37
To: Viechtbauer, Wolfgang (SP)
Cc: Martineau, Roger (AAFC/AAC); R-sig-meta-analysis at r-project.org
Subject: Re: [R-meta] Influential case diagnostics in a multivariate multilevel meta-analysis in metafor

Hi Wolfgang,

Thanks for your detailed reply and suggestions. Unfortunately, even after implementing your suggestions, I could not get the computation to terminate after letting it run for the night (with 4 logical cores).

I was going to suggest that perhaps the unbalanced dataset I am working with compared to the konstantopoulos2011 data has something to do with it (cluster size in my dataset ranges between 1 and 234 effect sizes with a mean of 11 and a median of 5). However, when I tried to run the konstantopoulos2011 code, I got similar running times for fitting the models (using standard BLAS), but I could not get the Cook?s distances computation to terminate even after 2050 seconds ? even when I used parallel processing with 4 logical cores. I used this code:

system.time(sav2 <- cooks.distance(res2, cluster=dat$group, reestimate=FALSE, parallel="snow", ncpus=4))

Any thoughts?

Thanks,
Yogev
--
Yogev Kivity, Ph.D.?
Postdoctoral Fellow?
Department of Psychology?
The Pennsylvania State University?
Bruce V. Moore Building?
University Park, PA 16802?
Office Phone: (814) 867-2330

On Thu, Jan 17, 2019 at 4:24 AM Viechtbauer, Wolfgang (SP) <wolfgang.viechtbauer at maastrichtuniversity.nl> wrote:

Please keep the mailing list in cc.

I don't know what model you are fitting, but with k=820, that running time seems excessive. Here is an artificial example with k=2800. I just use the data from 'dat.konstantopoulos2011' and replicate them 50 times to create a much larger dataset. I then fit a multilevel model with group (replication), district, and school as random effects. First, I use the defaults and then sparse=TRUE, since that should help quite a bit here. Also, I once run things with the standard BLAS routines and once with OpenBLAS (switching those routines requires making system changes, not something that can be done within R).

###########################

library(metafor)

dat <- dat.konstantopoulos2011
group <- rep(1:nrow(dat), each=50)
dat <- dat[group,]
dat$group <- group
rm(group)
nrow(dat)

system.time(res1 <- rma.mv(yi, vi, random = ~ 1 | group/district/school, data=dat))

system.time(res2 <- rma.mv(yi, vi, random = ~ 1 | group/district/school, data=dat, sparse=TRUE))

system.time(sav1 <- cooks.distance(res2, cluster=dat$group, reestimate=FALSE))

###### results:

### with standard BLAS

? ?user? system elapsed 
683.587? ?8.712 692.312

? ?user? system elapsed 
? 8.292? ?0.600? ?8.894

? ?user? system elapsed 
270.960? ?0.044 271.005 

### with OpenBLAS

? ?user? system elapsed 
?86.531? ?8.707? 95.242

? ?user? system elapsed 
? 6.476? ?0.632? ?7.108

? ?user? system elapsed 
148.071? ?0.060 148.133

###########################

So, with the defaults and standard BLAS, fitting that model takes 11.5 minutes, which is a bit painful (esp. if you then would compute the Cook's distances). Using sparse=TRUE brings this down to 9 seconds. Computing the 'group' level Cook's distances (using reestimate=FALSE, so really they are approximations, but usually good enough for diagnostic purposes) takes 4.5 minutes, which does require you to grab a cup of coffee and have a quick chat with a colleague at the coffee machine, but that isn't such a bad thing.

Switching to OpenBLAS helps esp. when using the defaults (now about 1.5 minutes). Using sparse=TRUE brings the time down to 7 seconds and the Cook's distances are then computed in about 2.5 minutes. That only leaves time to grab coffee and say hi to your colleague.

I did not use any multicore processing here, so if you use 2 cores, you can pretty much half the time to compute the Cook's distances (there is a bit of overhead when using multicore processing, but that should be minor here).

So, while rma.mv() isn't super fast, I am wondering why your (and Yogev's) running times are so long.

Best,
Wolfgang

-----Original Message-----
From: Martineau, Roger (AAFC/AAC) [mailto:roger.martineau at canada.ca] 
Sent: Wednesday, 16 January, 2019 19:21
To: Viechtbauer, Wolfgang (SP)
Subject: [R-meta] Influential case diagnostics in a multivariate multilevel meta-analysis in metafor

Dear Wolfgang,

I have exactly the same problem as Dr. Kivity and have not been able to solve it yet due to the size of the data set I presume (n = 820). I have to let Cook?s distance run overnight and it is a real pain. 

I checked the number of cores available (see below). Are they sufficient ?

[1] 4

[1] 2

This is one very frustrating issue with rma.mv, because I can fit a multilevel model using the lmer function (I know using rma.mv is more appropriate in a meta-analytic context) and will get Cook?s distance values a lot faster with the following:

[1] 642

Indeed, Cook?s distance values are not exactly the same using the rma.mv and the lmer function but large values should be detected using both functions.

Best regards,

Roger ?

S.V.P. notez ma nouvelle adresse courriel ci-bas
Please note my new email address below

Roger Martineau, mv Ph.D.
Nutrition et M?tabolisme des ruminants
Centre de recherche et de d?veloppement
sur le bovin laitier et le porc
Agriculture et agroalimentaire Canada/Agriculture and Agri-Food Canada
T?l?phone/Telephone: 819-780-7319
T?l?copieur/Facsimile: 819-564-5507
2000, Rue Coll?ge / 2000, College Street
Sherbrooke?(Qu?bec) ?J1M 0C8
Canada
roger.martineau at canada.ca
?
Dear Yogev,

Since you use 'cluster=StudyID', cooks.distance() is doing 311 model fits. But you use 'reestimate=FALSE', which should speed things up a lot. Also, 'sparse=TRUE' probably makes a lot of sense here, since the marginal var-cov structure is probably quite sparse. So, for the most part, you are already using features that should help to speed things up.

But a few things:

1) You used 'cluster = StudyID', but unless you used attach(Data) or have 'StudyID' as a separate object in your workspace, this should not work. It should be 'cluster = Data$StudyID'.

2) If you use 'parallel="snow"', then no progress bar will be shown, so I wonder how you got the '6%' then. Or did you run this once without 'parallel="snow"'?

3) If you use 'parallel="snow"', then this won't give you any speed increase unless you actually make use of multiple cores. You can do this with the 'ncpus' argument. But first check how many cores you actually have available with parallel::detectCores() Note that this also counts 'logical' cores. If you are on MacOS or Windows, then detectCores(logical=FALSE) is a better indicator of how many cores to specify under 'ncpus'.

Best,
Wolfgang

[R-meta] Influential case diagnostics in a multivariate multilevel meta-analysis in metafor

Thread (7 messages)