[R-meta] Influential case diagnostics in a multivariate multilevel meta-analysis in metafor

Thu, Jan 17, 2019 1:24 AM

Please keep the mailing list in cc.

I don't know what model you are fitting, but with k=820, that running time seems excessive. Here is an artificial example with k=2800. I just use the data from 'dat.konstantopoulos2011' and replicate them 50 times to create a much larger dataset. I then fit a multilevel model with group (replication), district, and school as random effects. First, I use the defaults and then sparse=TRUE, since that should help quite a bit here. Also, I once run things with the standard BLAS routines and once with OpenBLAS (switching those routines requires making system changes, not something that can be done within R).

###########################

library(metafor)

dat <- dat.konstantopoulos2011
group <- rep(1:nrow(dat), each=50)
dat <- dat[group,]
dat$group <- group
rm(group)
nrow(dat)

system.time(res1 <- rma.mv(yi, vi, random = ~ 1 | group/district/school, data=dat))

system.time(res2 <- rma.mv(yi, vi, random = ~ 1 | group/district/school, data=dat, sparse=TRUE))

system.time(sav1 <- cooks.distance(res2, cluster=dat$group, reestimate=FALSE))

###### results:

### with standard BLAS

user  system elapsed 
683.587   8.712 692.312

user  system elapsed 
  8.292   0.600   8.894

user  system elapsed 
270.960   0.044 271.005 

### with OpenBLAS

user  system elapsed 
 86.531   8.707  95.242

user  system elapsed 
  6.476   0.632   7.108

user  system elapsed 
148.071   0.060 148.133

###########################

So, with the defaults and standard BLAS, fitting that model takes 11.5 minutes, which is a bit painful (esp. if you then would compute the Cook's distances). Using sparse=TRUE brings this down to 9 seconds. Computing the 'group' level Cook's distances (using reestimate=FALSE, so really they are approximations, but usually good enough for diagnostic purposes) takes 4.5 minutes, which does require you to grab a cup of coffee and have a quick chat with a colleague at the coffee machine, but that isn't such a bad thing.

Switching to OpenBLAS helps esp. when using the defaults (now about 1.5 minutes). Using sparse=TRUE brings the time down to 7 seconds and the Cook's distances are then computed in about 2.5 minutes. That only leaves time to grab coffee and say hi to your colleague.

I did not use any multicore processing here, so if you use 2 cores, you can pretty much half the time to compute the Cook's distances (there is a bit of overhead when using multicore processing, but that should be minor here).

So, while rma.mv() isn't super fast, I am wondering why your (and Yogev's) running times are so long.

Best,
Wolfgang

-----Original Message-----
From: Martineau, Roger (AAFC/AAC) [mailto:roger.martineau at canada.ca] 
Sent: Wednesday, 16 January, 2019 19:21
To: Viechtbauer, Wolfgang (SP)
Subject: [R-meta] Influential case diagnostics in a multivariate multilevel meta-analysis in metafor

Dear Wolfgang,

I have exactly the same problem as Dr. Kivity and have not been able to solve it yet due to the size of the data set I presume (n = 820). I have to let Cook?s distance run overnight and it is a real pain. 

I checked the number of cores available (see below). Are they sufficient ?

[1] 4

[1] 2

This is one very frustrating issue with rma.mv, because I can fit a multilevel model using the lmer function (I know using rma.mv is more appropriate in a meta-analytic context) and will get Cook?s distance values a lot faster with the following:

[1] 642

Indeed, Cook?s distance values are not exactly the same using the rma.mv and the lmer function but large values should be detected using both functions.

Best regards,

Roger ?

S.V.P. notez ma nouvelle adresse courriel ci-bas
Please note my new email address below

Roger Martineau, mv Ph.D.
Nutrition et M?tabolisme des ruminants
Centre de recherche et de d?veloppement
sur le bovin laitier et le porc
Agriculture et agroalimentaire Canada/Agriculture and Agri-Food Canada
T?l?phone/Telephone: 819-780-7319
T?l?copieur/Facsimile: 819-564-5507
2000, Rue Coll?ge / 2000, College Street
Sherbrooke?(Qu?bec) ?J1M 0C8
Canada
roger.martineau at canada.ca
?
Dear Yogev,

Since you use 'cluster=StudyID', cooks.distance() is doing 311 model fits. But you use 'reestimate=FALSE', which should speed things up a lot. Also, 'sparse=TRUE' probably makes a lot of sense here, since the marginal var-cov structure is probably quite sparse. So, for the most part, you are already using features that should help to speed things up.

But a few things:

1) You used 'cluster = StudyID', but unless you used attach(Data) or have 'StudyID' as a separate object in your workspace, this should not work. It should be 'cluster = Data$StudyID'.

2) If you use 'parallel="snow"', then no progress bar will be shown, so I wonder how you got the '6%' then. Or did you run this once without 'parallel="snow"'?

3) If you use 'parallel="snow"', then this won't give you any speed increase unless you actually make use of multiple cores. You can do this with the 'ncpus' argument. But first check how many cores you actually have available with parallel::detectCores() Note that this also counts 'logical' cores. If you are on MacOS or Windows, then detectCores(logical=FALSE) is a better indicator of how many cores to specify under 'ncpus'.

Best,
Wolfgang

[R-meta] Influential case diagnostics in a multivariate multilevel meta-analysis in metafor

Thread (7 messages)