[R-meta] Influential case diagnostics in a multivariate multilevel meta-analysis in metafor - R-SIG-meta-analysis

Thu, Jan 17, 2019 1:24 AM #

Please keep the mailing list in cc.

I don't know what model you are fitting, but with k=820, that running time seems excessive. Here is an artificial example with k=2800. I just use the data from 'dat.konstantopoulos2011' and replicate them 50 times to create a much larger dataset. I then fit a multilevel model with group (replication), district, and school as random effects. First, I use the defaults and then sparse=TRUE, since that should help quite a bit here. Also, I once run things with the standard BLAS routines and once with OpenBLAS (switching those routines requires making system changes, not something that can be done within R).

###########################

library(metafor)

dat <- dat.konstantopoulos2011
group <- rep(1:nrow(dat), each=50)
dat <- dat[group,]
dat$group <- group
rm(group)
nrow(dat)

system.time(res1 <- rma.mv(yi, vi, random = ~ 1 | group/district/school, data=dat))

system.time(res2 <- rma.mv(yi, vi, random = ~ 1 | group/district/school, data=dat, sparse=TRUE))

system.time(sav1 <- cooks.distance(res2, cluster=dat$group, reestimate=FALSE))

###### results:

### with standard BLAS

user  system elapsed 
683.587   8.712 692.312

user  system elapsed 
  8.292   0.600   8.894

user  system elapsed 
270.960   0.044 271.005 

### with OpenBLAS

user  system elapsed 
 86.531   8.707  95.242

user  system elapsed 
  6.476   0.632   7.108

user  system elapsed 
148.071   0.060 148.133

###########################

So, with the defaults and standard BLAS, fitting that model takes 11.5 minutes, which is a bit painful (esp. if you then would compute the Cook's distances). Using sparse=TRUE brings this down to 9 seconds. Computing the 'group' level Cook's distances (using reestimate=FALSE, so really they are approximations, but usually good enough for diagnostic purposes) takes 4.5 minutes, which does require you to grab a cup of coffee and have a quick chat with a colleague at the coffee machine, but that isn't such a bad thing.

Switching to OpenBLAS helps esp. when using the defaults (now about 1.5 minutes). Using sparse=TRUE brings the time down to 7 seconds and the Cook's distances are then computed in about 2.5 minutes. That only leaves time to grab coffee and say hi to your colleague.

I did not use any multicore processing here, so if you use 2 cores, you can pretty much half the time to compute the Cook's distances (there is a bit of overhead when using multicore processing, but that should be minor here).

So, while rma.mv() isn't super fast, I am wondering why your (and Yogev's) running times are so long.

Best,
Wolfgang

-----Original Message-----
From: Martineau, Roger (AAFC/AAC) [mailto:roger.martineau at canada.ca] 
Sent: Wednesday, 16 January, 2019 19:21
To: Viechtbauer, Wolfgang (SP)
Subject: [R-meta] Influential case diagnostics in a multivariate multilevel meta-analysis in metafor

Dear Wolfgang,

I have exactly the same problem as Dr. Kivity and have not been able to solve it yet due to the size of the data set I presume (n = 820). I have to let Cook?s distance run overnight and it is a real pain. 

I checked the number of cores available (see below). Are they sufficient ?

[1] 4

[1] 2

This is one very frustrating issue with rma.mv, because I can fit a multilevel model using the lmer function (I know using rma.mv is more appropriate in a meta-analytic context) and will get Cook?s distance values a lot faster with the following:

[1] 642

Indeed, Cook?s distance values are not exactly the same using the rma.mv and the lmer function but large values should be detected using both functions.

Best regards,

Roger ?

S.V.P. notez ma nouvelle adresse courriel ci-bas
Please note my new email address below

Roger Martineau, mv Ph.D.
Nutrition et M?tabolisme des ruminants
Centre de recherche et de d?veloppement
sur le bovin laitier et le porc
Agriculture et agroalimentaire Canada/Agriculture and Agri-Food Canada
T?l?phone/Telephone: 819-780-7319
T?l?copieur/Facsimile: 819-564-5507
2000, Rue Coll?ge / 2000, College Street
Sherbrooke?(Qu?bec) ?J1M 0C8
Canada
roger.martineau at canada.ca
?
Dear Yogev,

Since you use 'cluster=StudyID', cooks.distance() is doing 311 model fits. But you use 'reestimate=FALSE', which should speed things up a lot. Also, 'sparse=TRUE' probably makes a lot of sense here, since the marginal var-cov structure is probably quite sparse. So, for the most part, you are already using features that should help to speed things up.

But a few things:

1) You used 'cluster = StudyID', but unless you used attach(Data) or have 'StudyID' as a separate object in your workspace, this should not work. It should be 'cluster = Data$StudyID'.

2) If you use 'parallel="snow"', then no progress bar will be shown, so I wonder how you got the '6%' then. Or did you run this once without 'parallel="snow"'?

3) If you use 'parallel="snow"', then this won't give you any speed increase unless you actually make use of multiple cores. You can do this with the 'ncpus' argument. But first check how many cores you actually have available with parallel::detectCores() Note that this also counts 'logical' cores. If you are on MacOS or Windows, then detectCores(logical=FALSE) is a better indicator of how many cores to specify under 'ncpus'.

Best,
Wolfgang

Yogev Kivity

Thu, Jan 17, 2019 12:19 PM #

Hi Wolfgang,

Thanks for your detailed reply and suggestions. Unfortunately, even after
implementing your suggestions, I could not get the computation to terminate
after letting it run for the night (with 4 logical cores).

I was going to suggest that perhaps the unbalanced dataset I am working
with compared to the konstantopoulos2011 data has something to do with it
(cluster size in my dataset ranges between 1 and 234 effect sizes with a
mean of 11 and a median of 5). However, when I tried to run the
konstantopoulos2011 code, I got similar running times for fitting the
models (using standard BLAS), but I could not get the Cook?s distances
computation to terminate even after 2050 seconds ? even when I used
parallel processing with 4 logical cores. I used this code:

system.time(sav2 <- cooks.distance(res2, cluster=dat$group,
reestimate=FALSE, parallel="snow", ncpus=4))

Any thoughts?

Thanks,
Yogev

--

Yogev Kivity, Ph.D.
Postdoctoral Fellow
Department of Psychology
The Pennsylvania State University
Bruce V. Moore Building
University Park, PA 16802
Office Phone: (814) 867-2330


On Thu, Jan 17, 2019 at 4:24 AM Viechtbauer, Wolfgang (SP) <

wolfgang.viechtbauer at maastrichtuniversity.nl> wrote:

Please keep the mailing list in cc.

I don't know what model you are fitting, but with k=820, that running time
seems excessive. Here is an artificial example with k=2800. I just use the
data from 'dat.konstantopoulos2011' and replicate them 50 times to create a
much larger dataset. I then fit a multilevel model with group
(replication), district, and school as random effects. First, I use the
defaults and then sparse=TRUE, since that should help quite a bit here.
Also, I once run things with the standard BLAS routines and once with
OpenBLAS (switching those routines requires making system changes, not
something that can be done within R).

###########################

library(metafor)

dat <- dat.konstantopoulos2011
group <- rep(1:nrow(dat), each=50)
dat <- dat[group,]
dat$group <- group
rm(group)
nrow(dat)

system.time(res1 <- rma.mv(yi, vi, random = ~ 1 | group/district/school,
data=dat))

system.time(res2 <- rma.mv(yi, vi, random = ~ 1 | group/district/school,
data=dat, sparse=TRUE))

system.time(sav1 <- cooks.distance(res2, cluster=dat$group,
reestimate=FALSE))

###### results:

### with standard BLAS

system.time(res1 <- rma.mv(yi, vi, random = ~ 1 |

group/district/school, data=dat))
   user  system elapsed
683.587   8.712 692.312

system.time(res2 <- rma.mv(yi, vi, random = ~ 1 |

group/district/school, data=dat, sparse=TRUE))
   user  system elapsed
  8.292   0.600   8.894

system.time(sav <- cooks.distance(res2, cluster=dat$group,

reestimate=FALSE))
   user  system elapsed
270.960   0.044 271.005

### with OpenBLAS

system.time(res1 <- rma.mv(yi, vi, random = ~ 1 |

group/district/school, data=dat))
   user  system elapsed
 86.531   8.707  95.242

system.time(res2 <- rma.mv(yi, vi, random = ~ 1 |

group/district/school, data=dat, sparse=TRUE))
   user  system elapsed
  6.476   0.632   7.108

system.time(sav1 <- cooks.distance(res2, cluster=dat$group,

reestimate=FALSE))
   user  system elapsed
148.071   0.060 148.133

###########################

So, with the defaults and standard BLAS, fitting that model takes 11.5
minutes, which is a bit painful (esp. if you then would compute the Cook's
distances). Using sparse=TRUE brings this down to 9 seconds. Computing the
'group' level Cook's distances (using reestimate=FALSE, so really they are
approximations, but usually good enough for diagnostic purposes) takes 4.5
minutes, which does require you to grab a cup of coffee and have a quick
chat with a colleague at the coffee machine, but that isn't such a bad
thing.

Switching to OpenBLAS helps esp. when using the defaults (now about 1.5
minutes). Using sparse=TRUE brings the time down to 7 seconds and the
Cook's distances are then computed in about 2.5 minutes. That only leaves
time to grab coffee and say hi to your colleague.

I did not use any multicore processing here, so if you use 2 cores, you
can pretty much half the time to compute the Cook's distances (there is a
bit of overhead when using multicore processing, but that should be minor
here).

So, while rma.mv() isn't super fast, I am wondering why your (and
Yogev's) running times are so long.

Best,
Wolfgang

-----Original Message-----
From: Martineau, Roger (AAFC/AAC) [mailto:roger.martineau at canada.ca]
Sent: Wednesday, 16 January, 2019 19:21
To: Viechtbauer, Wolfgang (SP)
Subject: [R-meta] Influential case diagnostics in a multivariate
multilevel meta-analysis in metafor

Dear Wolfgang,

I have exactly the same problem as Dr. Kivity and have not been able to
solve it yet due to the size of the data set I presume (n = 820). I have to
let Cook?s distance run overnight and it is a real pain.

I checked the number of cores available (see below). Are they sufficient ?

library(nat.utils)
ncpus()

[1] 4

library(parallel)
detectCores(logical=FALSE)

[1] 2

This is one very frustrating issue with rma.mv, because I can fit a
multilevel model using the lmer function (I know using rma.mv is more
appropriate in a meta-analytic context) and will get Cook?s distance values
a lot faster with the following:

library(influence.ME)
infl <- influence(NoMods, obs = TRUE)
plot(infl, which = "cook")
tmp.cook <- cooks.distance(infl)
plot(infl, which = "cook")
which(tmp.cook > 0.5)

[1] 642

Indeed, Cook?s distance values are not exactly the same using the rma.mv
and the lmer function but large values should be detected using both
functions.

Best regards,

Roger ?

S.V.P. notez ma nouvelle adresse courriel ci-bas
Please note my new email address below

Roger Martineau, mv Ph.D.
Nutrition et M?tabolisme des ruminants
Centre de recherche et de d?veloppement
sur le bovin laitier et le porc
Agriculture et agroalimentaire Canada/Agriculture and Agri-Food Canada
T?l?phone/Telephone: 819-780-7319
T?l?copieur/Facsimile: 819-564-5507
2000, Rue Coll?ge / 2000, College Street
Sherbrooke (Qu?bec)  J1M 0C8
Canada
roger.martineau at canada.ca

Dear Yogev,

Since you use 'cluster=StudyID', cooks.distance() is doing 311 model fits.
But you use 'reestimate=FALSE', which should speed things up a lot. Also,
'sparse=TRUE' probably makes a lot of sense here, since the marginal
var-cov structure is probably quite sparse. So, for the most part, you are
already using features that should help to speed things up.

But a few things:

1) You used 'cluster = StudyID', but unless you used attach(Data) or have
'StudyID' as a separate object in your workspace, this should not work. It
should be 'cluster = Data$StudyID'.

2) If you use 'parallel="snow"', then no progress bar will be shown, so I
wonder how you got the '6%' then. Or did you run this once without
'parallel="snow"'?

3) If you use 'parallel="snow"', then this won't give you any speed
increase unless you actually make use of multiple cores. You can do this
with the 'ncpus' argument. But first check how many cores you actually have
available with parallel::detectCores() Note that this also counts 'logical'
cores. If you are on MacOS or Windows, then detectCores(logical=FALSE) is a
better indicator of how many cores to specify under 'ncpus'.

Best,
Wolfgang

-----Original Message-----
From: R-sig-meta-analysis [mailto:r-sig-meta-analysis-bounces using r-
project.org] On Behalf Of Yogev Kivity
Sent: Tuesday, 15 January, 2019 21:20
To: r-sig-meta-analysis using r-project.org
Subject: [R-meta] Influential case diagnostics in a multivariate
multilevel meta-analysis in metafor

Hi all,

I am fitting a multivariate multilevel meta-analysis in metafor and
having
trouble computing outlier and influential case diagnostics (i.e., cook?s
distances per
https://wviechtb.github.io/metafor/reference/influence.rma.mv.html).

This a large dataset of 3360 Pearson?s correlations (converted to
Fisher?s
z) nested within 600 subsamples that are nested within 311 studies. Below
is the code I used for the model and for computing Cook?s distances, and
the problem is that it takes it a lot of time to run (I ran it overnight
and it only reached 6%). I am assuming it is related to the size of the
dataset and to the complex model structure, but I am not sure how to go
about and speed up the processing. I should note that I am computing the
distances based on the simplest possible model (i.e., no moderators and
without considering dependencies among effect sizes within clusters).

I was hoping someone could help with some suggestions of how best to move
forward.

Thanks,
Yogev

NoMods <- rma.mv(yi, vi, random = ~ 1 | StudyID/GroupID/EffectSizeID,
data=Data,sparse=TRUE)
summary(NoMods)
NoModsCooksDistance <- cooks.distance(NoMods,progbar = T,cluster =
StudyID,
reestimate=FALSE,parallel="snow")
NoModsCooksDistance
plot(NoModsCooksDistance, type="o", pch=19)

--

Yogev Kivity, Ph.D.
Postdoctoral Fellow
Department of Psychology
The Pennsylvania State University
Bruce V. Moore Building
University Park, PA 16802
Office Phone: (814) 867-2330

_______________________________________________
R-sig-meta-analysis mailing list
R-sig-meta-analysis at r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-meta-analysis

Yogev Kivity

Thu, Jan 17, 2019 12:37 PM #

Hi Wolfgang,

Thanks for your detailed reply and suggestions. Unfortunately, even after
implementing your suggestions, I could not get the computation to terminate
after letting it run for the night (with 4 logical cores).

I was going to suggest that perhaps the unbalanced dataset I am working
with compared to the konstantopoulos2011 data has something to do with it
(cluster size in my dataset ranges between 1 and 234 effect sizes with a
mean of 11 and a median of 5). However, when I tried to run the
konstantopoulos2011 code, I got similar running times for fitting the
models (using standard BLAS), but I could not get the Cook?s distances
computation to terminate even after 2050 seconds ? even when I used
parallel processing with 4 logical cores. I used this code:

system.time(sav2 <- cooks.distance(res2, cluster=dat$group,
reestimate=FALSE, parallel="snow", ncpus=4))

Any thoughts?

Thanks,
Yogev

--

Yogev Kivity, Ph.D.
Postdoctoral Fellow
Department of Psychology
The Pennsylvania State University
Bruce V. Moore Building
University Park, PA 16802
Office Phone: (814) 867-2330


On Thu, Jan 17, 2019 at 4:24 AM Viechtbauer, Wolfgang (SP) <

wolfgang.viechtbauer at maastrichtuniversity.nl> wrote:

Please keep the mailing list in cc.

I don't know what model you are fitting, but with k=820, that running time
seems excessive. Here is an artificial example with k=2800. I just use the
data from 'dat.konstantopoulos2011' and replicate them 50 times to create a
much larger dataset. I then fit a multilevel model with group
(replication), district, and school as random effects. First, I use the
defaults and then sparse=TRUE, since that should help quite a bit here.
Also, I once run things with the standard BLAS routines and once with
OpenBLAS (switching those routines requires making system changes, not
something that can be done within R).

###########################

library(metafor)

dat <- dat.konstantopoulos2011
group <- rep(1:nrow(dat), each=50)
dat <- dat[group,]
dat$group <- group
rm(group)
nrow(dat)

system.time(res1 <- rma.mv(yi, vi, random = ~ 1 | group/district/school,
data=dat))

system.time(res2 <- rma.mv(yi, vi, random = ~ 1 | group/district/school,
data=dat, sparse=TRUE))

system.time(sav1 <- cooks.distance(res2, cluster=dat$group,
reestimate=FALSE))

###### results:

### with standard BLAS

system.time(res1 <- rma.mv(yi, vi, random = ~ 1 |

group/district/school, data=dat))
   user  system elapsed
683.587   8.712 692.312

system.time(res2 <- rma.mv(yi, vi, random = ~ 1 |

group/district/school, data=dat, sparse=TRUE))
   user  system elapsed
  8.292   0.600   8.894

system.time(sav <- cooks.distance(res2, cluster=dat$group,

reestimate=FALSE))
   user  system elapsed
270.960   0.044 271.005

### with OpenBLAS

system.time(res1 <- rma.mv(yi, vi, random = ~ 1 |

group/district/school, data=dat))
   user  system elapsed
 86.531   8.707  95.242

system.time(res2 <- rma.mv(yi, vi, random = ~ 1 |

group/district/school, data=dat, sparse=TRUE))
   user  system elapsed
  6.476   0.632   7.108

system.time(sav1 <- cooks.distance(res2, cluster=dat$group,

reestimate=FALSE))
   user  system elapsed
148.071   0.060 148.133

###########################

So, with the defaults and standard BLAS, fitting that model takes 11.5
minutes, which is a bit painful (esp. if you then would compute the Cook's
distances). Using sparse=TRUE brings this down to 9 seconds. Computing the
'group' level Cook's distances (using reestimate=FALSE, so really they are
approximations, but usually good enough for diagnostic purposes) takes 4.5
minutes, which does require you to grab a cup of coffee and have a quick
chat with a colleague at the coffee machine, but that isn't such a bad
thing.

Switching to OpenBLAS helps esp. when using the defaults (now about 1.5
minutes). Using sparse=TRUE brings the time down to 7 seconds and the
Cook's distances are then computed in about 2.5 minutes. That only leaves
time to grab coffee and say hi to your colleague.

I did not use any multicore processing here, so if you use 2 cores, you
can pretty much half the time to compute the Cook's distances (there is a
bit of overhead when using multicore processing, but that should be minor
here).

So, while rma.mv() isn't super fast, I am wondering why your (and
Yogev's) running times are so long.

Best,
Wolfgang

-----Original Message-----
From: Martineau, Roger (AAFC/AAC) [mailto:roger.martineau at canada.ca]
Sent: Wednesday, 16 January, 2019 19:21
To: Viechtbauer, Wolfgang (SP)
Subject: [R-meta] Influential case diagnostics in a multivariate
multilevel meta-analysis in metafor

Dear Wolfgang,

I have exactly the same problem as Dr. Kivity and have not been able to
solve it yet due to the size of the data set I presume (n = 820). I have to
let Cook?s distance run overnight and it is a real pain.

I checked the number of cores available (see below). Are they sufficient ?

library(nat.utils)
ncpus()

[1] 4

library(parallel)
detectCores(logical=FALSE)

[1] 2

This is one very frustrating issue with rma.mv, because I can fit a
multilevel model using the lmer function (I know using rma.mv is more
appropriate in a meta-analytic context) and will get Cook?s distance values
a lot faster with the following:

library(influence.ME)
infl <- influence(NoMods, obs = TRUE)
plot(infl, which = "cook")
tmp.cook <- cooks.distance(infl)
plot(infl, which = "cook")
which(tmp.cook > 0.5)

[1] 642

Indeed, Cook?s distance values are not exactly the same using the rma.mv
and the lmer function but large values should be detected using both
functions.

Best regards,

Roger ?

S.V.P. notez ma nouvelle adresse courriel ci-bas
Please note my new email address below

Roger Martineau, mv Ph.D.
Nutrition et M?tabolisme des ruminants
Centre de recherche et de d?veloppement
sur le bovin laitier et le porc
Agriculture et agroalimentaire Canada/Agriculture and Agri-Food Canada
T?l?phone/Telephone: 819-780-7319
T?l?copieur/Facsimile: 819-564-5507
2000, Rue Coll?ge / 2000, College Street
Sherbrooke (Qu?bec)  J1M 0C8
Canada
roger.martineau at canada.ca

Dear Yogev,

Since you use 'cluster=StudyID', cooks.distance() is doing 311 model fits.
But you use 'reestimate=FALSE', which should speed things up a lot. Also,
'sparse=TRUE' probably makes a lot of sense here, since the marginal
var-cov structure is probably quite sparse. So, for the most part, you are
already using features that should help to speed things up.

But a few things:

1) You used 'cluster = StudyID', but unless you used attach(Data) or have
'StudyID' as a separate object in your workspace, this should not work. It
should be 'cluster = Data$StudyID'.

2) If you use 'parallel="snow"', then no progress bar will be shown, so I
wonder how you got the '6%' then. Or did you run this once without
'parallel="snow"'?

3) If you use 'parallel="snow"', then this won't give you any speed
increase unless you actually make use of multiple cores. You can do this
with the 'ncpus' argument. But first check how many cores you actually have
available with parallel::detectCores() Note that this also counts 'logical'
cores. If you are on MacOS or Windows, then detectCores(logical=FALSE) is a
better indicator of how many cores to specify under 'ncpus'.

Best,
Wolfgang

-----Original Message-----
From: R-sig-meta-analysis [mailto:r-sig-meta-analysis-bounces using r-
project.org] On Behalf Of Yogev Kivity
Sent: Tuesday, 15 January, 2019 21:20
To: r-sig-meta-analysis using r-project.org
Subject: [R-meta] Influential case diagnostics in a multivariate
multilevel meta-analysis in metafor

Hi all,

I am fitting a multivariate multilevel meta-analysis in metafor and
having
trouble computing outlier and influential case diagnostics (i.e., cook?s
distances per
https://wviechtb.github.io/metafor/reference/influence.rma.mv.html).

This a large dataset of 3360 Pearson?s correlations (converted to
Fisher?s
z) nested within 600 subsamples that are nested within 311 studies. Below
is the code I used for the model and for computing Cook?s distances, and
the problem is that it takes it a lot of time to run (I ran it overnight
and it only reached 6%). I am assuming it is related to the size of the
dataset and to the complex model structure, but I am not sure how to go
about and speed up the processing. I should note that I am computing the
distances based on the simplest possible model (i.e., no moderators and
without considering dependencies among effect sizes within clusters).

I was hoping someone could help with some suggestions of how best to move
forward.

Thanks,
Yogev

NoMods <- rma.mv(yi, vi, random = ~ 1 | StudyID/GroupID/EffectSizeID,
data=Data,sparse=TRUE)
summary(NoMods)
NoModsCooksDistance <- cooks.distance(NoMods,progbar = T,cluster =
StudyID,
reestimate=FALSE,parallel="snow")
NoModsCooksDistance
plot(NoModsCooksDistance, type="o", pch=19)

--

Yogev Kivity, Ph.D.
Postdoctoral Fellow
Department of Psychology
The Pennsylvania State University
Bruce V. Moore Building
University Park, PA 16802
Office Phone: (814) 867-2330

_______________________________________________
R-sig-meta-analysis mailing list
R-sig-meta-analysis at r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-meta-analysis

Wolfgang Viechtbauer

Thu, Jan 17, 2019 2:16 PM #

Hi Yogev,

Just to be safe, make sure you are using the latest 'devel' version of metafor. Run devtools::install_github("wviechtb/metafor") to be sure. Also, I would go with whatever detectCores(logical=FALSE) tells you for the number of cores. But even without that, things should finish in a few minutes. Beyond that, I really don't know what the issue could be. It certainly isn't an issue with metafor per se.

Best,
Wolfgang

-----Original Message-----
From: Yogev Kivity [mailto:yogev_k at yahoo.com] 
Sent: Thursday, 17 January, 2019 21:37
To: Viechtbauer, Wolfgang (SP)
Cc: Martineau, Roger (AAFC/AAC); R-sig-meta-analysis at r-project.org
Subject: Re: [R-meta] Influential case diagnostics in a multivariate multilevel meta-analysis in metafor

Hi Wolfgang,

Thanks for your detailed reply and suggestions. Unfortunately, even after implementing your suggestions, I could not get the computation to terminate after letting it run for the night (with 4 logical cores).

I was going to suggest that perhaps the unbalanced dataset I am working with compared to the konstantopoulos2011 data has something to do with it (cluster size in my dataset ranges between 1 and 234 effect sizes with a mean of 11 and a median of 5). However, when I tried to run the konstantopoulos2011 code, I got similar running times for fitting the models (using standard BLAS), but I could not get the Cook?s distances computation to terminate even after 2050 seconds ? even when I used parallel processing with 4 logical cores. I used this code:

system.time(sav2 <- cooks.distance(res2, cluster=dat$group, reestimate=FALSE, parallel="snow", ncpus=4))

Any thoughts?

Thanks,
Yogev
--
Yogev Kivity, Ph.D.?
Postdoctoral Fellow?
Department of Psychology?
The Pennsylvania State University?
Bruce V. Moore Building?
University Park, PA 16802?
Office Phone: (814) 867-2330

On Thu, Jan 17, 2019 at 4:24 AM Viechtbauer, Wolfgang (SP) <wolfgang.viechtbauer at maastrichtuniversity.nl> wrote:

Please keep the mailing list in cc.

I don't know what model you are fitting, but with k=820, that running time seems excessive. Here is an artificial example with k=2800. I just use the data from 'dat.konstantopoulos2011' and replicate them 50 times to create a much larger dataset. I then fit a multilevel model with group (replication), district, and school as random effects. First, I use the defaults and then sparse=TRUE, since that should help quite a bit here. Also, I once run things with the standard BLAS routines and once with OpenBLAS (switching those routines requires making system changes, not something that can be done within R).

###########################

library(metafor)

dat <- dat.konstantopoulos2011
group <- rep(1:nrow(dat), each=50)
dat <- dat[group,]
dat$group <- group
rm(group)
nrow(dat)

system.time(res1 <- rma.mv(yi, vi, random = ~ 1 | group/district/school, data=dat))

system.time(res2 <- rma.mv(yi, vi, random = ~ 1 | group/district/school, data=dat, sparse=TRUE))

system.time(sav1 <- cooks.distance(res2, cluster=dat$group, reestimate=FALSE))

###### results:

### with standard BLAS

? ?user? system elapsed 
683.587? ?8.712 692.312

? ?user? system elapsed 
? 8.292? ?0.600? ?8.894

? ?user? system elapsed 
270.960? ?0.044 271.005 

### with OpenBLAS

? ?user? system elapsed 
?86.531? ?8.707? 95.242

? ?user? system elapsed 
? 6.476? ?0.632? ?7.108

? ?user? system elapsed 
148.071? ?0.060 148.133

###########################

So, with the defaults and standard BLAS, fitting that model takes 11.5 minutes, which is a bit painful (esp. if you then would compute the Cook's distances). Using sparse=TRUE brings this down to 9 seconds. Computing the 'group' level Cook's distances (using reestimate=FALSE, so really they are approximations, but usually good enough for diagnostic purposes) takes 4.5 minutes, which does require you to grab a cup of coffee and have a quick chat with a colleague at the coffee machine, but that isn't such a bad thing.

Switching to OpenBLAS helps esp. when using the defaults (now about 1.5 minutes). Using sparse=TRUE brings the time down to 7 seconds and the Cook's distances are then computed in about 2.5 minutes. That only leaves time to grab coffee and say hi to your colleague.

I did not use any multicore processing here, so if you use 2 cores, you can pretty much half the time to compute the Cook's distances (there is a bit of overhead when using multicore processing, but that should be minor here).

So, while rma.mv() isn't super fast, I am wondering why your (and Yogev's) running times are so long.

Best,
Wolfgang

-----Original Message-----
From: Martineau, Roger (AAFC/AAC) [mailto:roger.martineau at canada.ca] 
Sent: Wednesday, 16 January, 2019 19:21
To: Viechtbauer, Wolfgang (SP)
Subject: [R-meta] Influential case diagnostics in a multivariate multilevel meta-analysis in metafor

Dear Wolfgang,

I have exactly the same problem as Dr. Kivity and have not been able to solve it yet due to the size of the data set I presume (n = 820). I have to let Cook?s distance run overnight and it is a real pain. 

I checked the number of cores available (see below). Are they sufficient ?

[1] 4

[1] 2

This is one very frustrating issue with rma.mv, because I can fit a multilevel model using the lmer function (I know using rma.mv is more appropriate in a meta-analytic context) and will get Cook?s distance values a lot faster with the following:

[1] 642

Indeed, Cook?s distance values are not exactly the same using the rma.mv and the lmer function but large values should be detected using both functions.

Best regards,

Roger ?

S.V.P. notez ma nouvelle adresse courriel ci-bas
Please note my new email address below

Roger Martineau, mv Ph.D.
Nutrition et M?tabolisme des ruminants
Centre de recherche et de d?veloppement
sur le bovin laitier et le porc
Agriculture et agroalimentaire Canada/Agriculture and Agri-Food Canada
T?l?phone/Telephone: 819-780-7319
T?l?copieur/Facsimile: 819-564-5507
2000, Rue Coll?ge / 2000, College Street
Sherbrooke?(Qu?bec) ?J1M 0C8
Canada
roger.martineau at canada.ca
?
Dear Yogev,

Since you use 'cluster=StudyID', cooks.distance() is doing 311 model fits. But you use 'reestimate=FALSE', which should speed things up a lot. Also, 'sparse=TRUE' probably makes a lot of sense here, since the marginal var-cov structure is probably quite sparse. So, for the most part, you are already using features that should help to speed things up.

But a few things:

1) You used 'cluster = StudyID', but unless you used attach(Data) or have 'StudyID' as a separate object in your workspace, this should not work. It should be 'cluster = Data$StudyID'.

2) If you use 'parallel="snow"', then no progress bar will be shown, so I wonder how you got the '6%' then. Or did you run this once without 'parallel="snow"'?

3) If you use 'parallel="snow"', then this won't give you any speed increase unless you actually make use of multiple cores. You can do this with the 'ncpus' argument. But first check how many cores you actually have available with parallel::detectCores() Note that this also counts 'logical' cores. If you are on MacOS or Windows, then detectCores(logical=FALSE) is a better indicator of how many cores to specify under 'ncpus'.

Best,
Wolfgang

Yogev Kivity

Fri, Jan 18, 2019 7:15 AM #

Hi Wolfgang,

Using the latest 'devel' version of metafor worked! It took the computation
about 10 minutes to run with 4 parallel cores (number of cores was indeed
determined using the 'parallel' package).

Thanks for all your help!
Yogev

--

Yogev Kivity, Ph.D.
Postdoctoral Fellow
Department of Psychology
The Pennsylvania State University
Bruce V. Moore Building
University Park, PA 16802
Office Phone: (814) 867-2330


On Thu, Jan 17, 2019 at 5:16 PM Viechtbauer, Wolfgang (SP) <

wolfgang.viechtbauer at maastrichtuniversity.nl> wrote:

Hi Yogev,

Just to be safe, make sure you are using the latest 'devel' version of
metafor. Run devtools::install_github("wviechtb/metafor") to be sure. Also,
I would go with whatever detectCores(logical=FALSE) tells you for the
number of cores. But even without that, things should finish in a few
minutes. Beyond that, I really don't know what the issue could be. It
certainly isn't an issue with metafor per se.

Best,
Wolfgang

-----Original Message-----
From: Yogev Kivity [mailto:yogev_k at yahoo.com]
Sent: Thursday, 17 January, 2019 21:37
To: Viechtbauer, Wolfgang (SP)
Cc: Martineau, Roger (AAFC/AAC); R-sig-meta-analysis at r-project.org
Subject: Re: [R-meta] Influential case diagnostics in a multivariate
multilevel meta-analysis in metafor

Hi Wolfgang,

Thanks for your detailed reply and suggestions. Unfortunately, even after
implementing your suggestions, I could not get the computation to terminate
after letting it run for the night (with 4 logical cores).

I was going to suggest that perhaps the unbalanced dataset I am working
with compared to the konstantopoulos2011 data has something to do with it
(cluster size in my dataset ranges between 1 and 234 effect sizes with a
mean of 11 and a median of 5). However, when I tried to run the
konstantopoulos2011 code, I got similar running times for fitting the
models (using standard BLAS), but I could not get the Cook?s distances
computation to terminate even after 2050 seconds ? even when I used
parallel processing with 4 logical cores. I used this code:

system.time(sav2 <- cooks.distance(res2, cluster=dat$group,
reestimate=FALSE, parallel="snow", ncpus=4))

Any thoughts?

Thanks,
Yogev
--
Yogev Kivity, Ph.D.
Postdoctoral Fellow
Department of Psychology
The Pennsylvania State University
Bruce V. Moore Building
University Park, PA 16802
Office Phone: (814) 867-2330

On Thu, Jan 17, 2019 at 4:24 AM Viechtbauer, Wolfgang (SP) <
wolfgang.viechtbauer at maastrichtuniversity.nl> wrote:
Please keep the mailing list in cc.

I don't know what model you are fitting, but with k=820, that running time
seems excessive. Here is an artificial example with k=2800. I just use the
data from 'dat.konstantopoulos2011' and replicate them 50 times to create a
much larger dataset. I then fit a multilevel model with group
(replication), district, and school as random effects. First, I use the
defaults and then sparse=TRUE, since that should help quite a bit here.
Also, I once run things with the standard BLAS routines and once with
OpenBLAS (switching those routines requires making system changes, not
something that can be done within R).

###########################

library(metafor)

dat <- dat.konstantopoulos2011
group <- rep(1:nrow(dat), each=50)
dat <- dat[group,]
dat$group <- group
rm(group)
nrow(dat)

system.time(res1 <- rma.mv(yi, vi, random = ~ 1 | group/district/school,
data=dat))

system.time(res2 <- rma.mv(yi, vi, random = ~ 1 | group/district/school,
data=dat, sparse=TRUE))

system.time(sav1 <- cooks.distance(res2, cluster=dat$group,
reestimate=FALSE))

###### results:

### with standard BLAS

system.time(res1 <- rma.mv(yi, vi, random = ~ 1 |

group/district/school, data=dat))
   user  system elapsed
683.587   8.712 692.312

system.time(res2 <- rma.mv(yi, vi, random = ~ 1 |

group/district/school, data=dat, sparse=TRUE))
   user  system elapsed
  8.292   0.600   8.894

system.time(sav <- cooks.distance(res2, cluster=dat$group,

reestimate=FALSE))
   user  system elapsed
270.960   0.044 271.005

### with OpenBLAS

system.time(res1 <- rma.mv(yi, vi, random = ~ 1 |

group/district/school, data=dat))
   user  system elapsed
 86.531   8.707  95.242

system.time(res2 <- rma.mv(yi, vi, random = ~ 1 |

group/district/school, data=dat, sparse=TRUE))
   user  system elapsed
  6.476   0.632   7.108

system.time(sav1 <- cooks.distance(res2, cluster=dat$group,

reestimate=FALSE))
   user  system elapsed
148.071   0.060 148.133

###########################

So, with the defaults and standard BLAS, fitting that model takes 11.5
minutes, which is a bit painful (esp. if you then would compute the Cook's
distances). Using sparse=TRUE brings this down to 9 seconds. Computing the
'group' level Cook's distances (using reestimate=FALSE, so really they are
approximations, but usually good enough for diagnostic purposes) takes 4.5
minutes, which does require you to grab a cup of coffee and have a quick
chat with a colleague at the coffee machine, but that isn't such a bad
thing.

Switching to OpenBLAS helps esp. when using the defaults (now about 1.5
minutes). Using sparse=TRUE brings the time down to 7 seconds and the
Cook's distances are then computed in about 2.5 minutes. That only leaves
time to grab coffee and say hi to your colleague.

I did not use any multicore processing here, so if you use 2 cores, you
can pretty much half the time to compute the Cook's distances (there is a
bit of overhead when using multicore processing, but that should be minor
here).

So, while rma.mv() isn't super fast, I am wondering why your (and
Yogev's) running times are so long.

Best,
Wolfgang

-----Original Message-----
From: Martineau, Roger (AAFC/AAC) [mailto:roger.martineau at canada.ca]
Sent: Wednesday, 16 January, 2019 19:21
To: Viechtbauer, Wolfgang (SP)
Subject: [R-meta] Influential case diagnostics in a multivariate
multilevel meta-analysis in metafor

Dear Wolfgang,

I have exactly the same problem as Dr. Kivity and have not been able to
solve it yet due to the size of the data set I presume (n = 820). I have to
let Cook?s distance run overnight and it is a real pain.

I checked the number of cores available (see below). Are they sufficient ?

library(nat.utils)
ncpus()

[1] 4

library(parallel)
detectCores(logical=FALSE)

[1] 2

This is one very frustrating issue with rma.mv, because I can fit a
multilevel model using the lmer function (I know using rma.mv is more
appropriate in a meta-analytic context) and will get Cook?s distance values
a lot faster with the following:

library(influence.ME)
infl <- influence(NoMods, obs = TRUE)
plot(infl, which = "cook")
tmp.cook <- cooks.distance(infl)
plot(infl, which = "cook")
which(tmp.cook > 0.5)

[1] 642

Indeed, Cook?s distance values are not exactly the same using the rma.mv
and the lmer function but large values should be detected using both
functions.

Best regards,

Roger ?

S.V.P. notez ma nouvelle adresse courriel ci-bas
Please note my new email address below

Roger Martineau, mv Ph.D.
Nutrition et M?tabolisme des ruminants
Centre de recherche et de d?veloppement
sur le bovin laitier et le porc
Agriculture et agroalimentaire Canada/Agriculture and Agri-Food Canada
T?l?phone/Telephone: 819-780-7319
T?l?copieur/Facsimile: 819-564-5507
2000, Rue Coll?ge / 2000, College Street
Sherbrooke (Qu?bec)  J1M 0C8
Canada
roger.martineau at canada.ca

Dear Yogev,

Since you use 'cluster=StudyID', cooks.distance() is doing 311 model fits.
But you use 'reestimate=FALSE', which should speed things up a lot. Also,
'sparse=TRUE' probably makes a lot of sense here, since the marginal
var-cov structure is probably quite sparse. So, for the most part, you are
already using features that should help to speed things up.

But a few things:

1) You used 'cluster = StudyID', but unless you used attach(Data) or have
'StudyID' as a separate object in your workspace, this should not work. It
should be 'cluster = Data$StudyID'.

2) If you use 'parallel="snow"', then no progress bar will be shown, so I
wonder how you got the '6%' then. Or did you run this once without
'parallel="snow"'?

3) If you use 'parallel="snow"', then this won't give you any speed
increase unless you actually make use of multiple cores. You can do this
with the 'ncpus' argument. But first check how many cores you actually have
available with parallel::detectCores() Note that this also counts 'logical'
cores. If you are on MacOS or Windows, then detectCores(logical=FALSE) is a
better indicator of how many cores to specify under 'ncpus'.

Best,
Wolfgang

-----Original Message-----
From: R-sig-meta-analysis [mailto:r-sig-meta-analysis-bounces using r-
project.org] On Behalf Of Yogev Kivity
Sent: Tuesday, 15 January, 2019 21:20
To: r-sig-meta-analysis using r-project.org
Subject: [R-meta] Influential case diagnostics in a multivariate
multilevel meta-analysis in metafor

Hi all,

I am fitting a multivariate multilevel meta-analysis in metafor and
having
trouble computing outlier and influential case diagnostics (i.e., cook?s
distances per
https://wviechtb.github.io/metafor/reference/influence.rma.mv.html).

This a large dataset of 3360 Pearson?s correlations (converted to
Fisher?s
z) nested within 600 subsamples that are nested within 311 studies. Below
is the code I used for the model and for computing Cook?s distances, and
the problem is that it takes it a lot of time to run (I ran it overnight
and it only reached 6%). I am assuming it is related to the size of the
dataset and to the complex model structure, but I am not sure how to go
about and speed up the processing. I should note that I am computing the
distances based on the simplest possible model (i.e., no moderators and
without considering dependencies among effect sizes within clusters).

I was hoping someone could help with some suggestions of how best to move
forward.

Thanks,
Yogev

NoMods <- rma.mv(yi, vi, random = ~ 1 | StudyID/GroupID/EffectSizeID,
data=Data,sparse=TRUE)
summary(NoMods)
NoModsCooksDistance <- cooks.distance(NoMods,progbar = T,cluster =
StudyID,
reestimate=FALSE,parallel="snow")
NoModsCooksDistance
plot(NoModsCooksDistance, type="o", pch=19)

--

Yogev Kivity, Ph.D.
Postdoctoral Fellow
Department of Psychology
The Pennsylvania State University
Bruce V. Moore Building
University Park, PA 16802
Office Phone: (814) 867-2330

Yogev Kivity

Fri, Jan 18, 2019 7:18 AM #

Hi Wolfgang,

Using the latest 'devel' version of metafor worked! It took the computation
about 10 minutes to run with 4 parallel cores (number of cores was indeed
determined using the 'parallel' package).

Thanks for all your help!

Yogev

--

Yogev Kivity, Ph.D.
Postdoctoral Fellow
Department of Psychology
The Pennsylvania State University
Bruce V. Moore Building
University Park, PA 16802
Office Phone: (814) 867-2330


On Thu, Jan 17, 2019 at 5:16 PM Viechtbauer, Wolfgang (SP) <

wolfgang.viechtbauer at maastrichtuniversity.nl> wrote:

Hi Yogev,

Just to be safe, make sure you are using the latest 'devel' version of
metafor. Run devtools::install_github("wviechtb/metafor") to be sure. Also,
I would go with whatever detectCores(logical=FALSE) tells you for the
number of cores. But even without that, things should finish in a few
minutes. Beyond that, I really don't know what the issue could be. It
certainly isn't an issue with metafor per se.

Best,
Wolfgang

-----Original Message-----
From: Yogev Kivity [mailto:yogev_k at yahoo.com]
Sent: Thursday, 17 January, 2019 21:37
To: Viechtbauer, Wolfgang (SP)
Cc: Martineau, Roger (AAFC/AAC); R-sig-meta-analysis at r-project.org
Subject: Re: [R-meta] Influential case diagnostics in a multivariate
multilevel meta-analysis in metafor

Hi Wolfgang,

Thanks for your detailed reply and suggestions. Unfortunately, even after
implementing your suggestions, I could not get the computation to terminate
after letting it run for the night (with 4 logical cores).

I was going to suggest that perhaps the unbalanced dataset I am working
with compared to the konstantopoulos2011 data has something to do with it
(cluster size in my dataset ranges between 1 and 234 effect sizes with a
mean of 11 and a median of 5). However, when I tried to run the
konstantopoulos2011 code, I got similar running times for fitting the
models (using standard BLAS), but I could not get the Cook?s distances
computation to terminate even after 2050 seconds ? even when I used
parallel processing with 4 logical cores. I used this code:

system.time(sav2 <- cooks.distance(res2, cluster=dat$group,
reestimate=FALSE, parallel="snow", ncpus=4))

Any thoughts?

Thanks,
Yogev

--

Yogev Kivity, Ph.D.
Postdoctoral Fellow
Department of Psychology
The Pennsylvania State University
Bruce V. Moore Building
University Park, PA 16802
Office Phone: (814) 867-2330


On Thu, Jan 17, 2019 at 4:24 AM Viechtbauer, Wolfgang (SP) <
wolfgang.viechtbauer at maastrichtuniversity.nl> wrote:
Please keep the mailing list in cc.

I don't know what model you are fitting, but with k=820, that running time
seems excessive. Here is an artificial example with k=2800. I just use the
data from 'dat.konstantopoulos2011' and replicate them 50 times to create a
much larger dataset. I then fit a multilevel model with group
(replication), district, and school as random effects. First, I use the
defaults and then sparse=TRUE, since that should help quite a bit here.
Also, I once run things with the standard BLAS routines and once with
OpenBLAS (switching those routines requires making system changes, not
something that can be done within R).

###########################

library(metafor)

dat <- dat.konstantopoulos2011
group <- rep(1:nrow(dat), each=50)
dat <- dat[group,]
dat$group <- group
rm(group)
nrow(dat)

system.time(res1 <- rma.mv(yi, vi, random = ~ 1 | group/district/school,
data=dat))

system.time(res2 <- rma.mv(yi, vi, random = ~ 1 | group/district/school,
data=dat, sparse=TRUE))

system.time(sav1 <- cooks.distance(res2, cluster=dat$group,
reestimate=FALSE))

###### results:

### with standard BLAS

system.time(res1 <- rma.mv(yi, vi, random = ~ 1 |

group/district/school, data=dat))
   user  system elapsed
683.587   8.712 692.312

system.time(res2 <- rma.mv(yi, vi, random = ~ 1 |

group/district/school, data=dat, sparse=TRUE))
   user  system elapsed
  8.292   0.600   8.894

system.time(sav <- cooks.distance(res2, cluster=dat$group,

reestimate=FALSE))
   user  system elapsed
270.960   0.044 271.005

### with OpenBLAS

system.time(res1 <- rma.mv(yi, vi, random = ~ 1 |

group/district/school, data=dat))
   user  system elapsed
 86.531   8.707  95.242

system.time(res2 <- rma.mv(yi, vi, random = ~ 1 |

group/district/school, data=dat, sparse=TRUE))
   user  system elapsed
  6.476   0.632   7.108

system.time(sav1 <- cooks.distance(res2, cluster=dat$group,

reestimate=FALSE))
   user  system elapsed
148.071   0.060 148.133

###########################

So, with the defaults and standard BLAS, fitting that model takes 11.5
minutes, which is a bit painful (esp. if you then would compute the Cook's
distances). Using sparse=TRUE brings this down to 9 seconds. Computing the
'group' level Cook's distances (using reestimate=FALSE, so really they are
approximations, but usually good enough for diagnostic purposes) takes 4.5
minutes, which does require you to grab a cup of coffee and have a quick
chat with a colleague at the coffee machine, but that isn't such a bad
thing.

Switching to OpenBLAS helps esp. when using the defaults (now about 1.5
minutes). Using sparse=TRUE brings the time down to 7 seconds and the
Cook's distances are then computed in about 2.5 minutes. That only leaves
time to grab coffee and say hi to your colleague.

I did not use any multicore processing here, so if you use 2 cores, you
can pretty much half the time to compute the Cook's distances (there is a
bit of overhead when using multicore processing, but that should be minor
here).

So, while rma.mv() isn't super fast, I am wondering why your (and
Yogev's) running times are so long.

Best,
Wolfgang

-----Original Message-----
From: Martineau, Roger (AAFC/AAC) [mailto:roger.martineau at canada.ca]
Sent: Wednesday, 16 January, 2019 19:21
To: Viechtbauer, Wolfgang (SP)
Subject: [R-meta] Influential case diagnostics in a multivariate
multilevel meta-analysis in metafor

Dear Wolfgang,

I have exactly the same problem as Dr. Kivity and have not been able to
solve it yet due to the size of the data set I presume (n = 820). I have to
let Cook?s distance run overnight and it is a real pain.

I checked the number of cores available (see below). Are they sufficient ?

library(nat.utils)
ncpus()

[1] 4

library(parallel)
detectCores(logical=FALSE)

[1] 2

This is one very frustrating issue with rma.mv, because I can fit a
multilevel model using the lmer function (I know using rma.mv is more
appropriate in a meta-analytic context) and will get Cook?s distance values
a lot faster with the following:

library(influence.ME)
infl <- influence(NoMods, obs = TRUE)
plot(infl, which = "cook")
tmp.cook <- cooks.distance(infl)
plot(infl, which = "cook")
which(tmp.cook > 0.5)

[1] 642

Indeed, Cook?s distance values are not exactly the same using the rma.mv
and the lmer function but large values should be detected using both
functions.

Best regards,

Roger ?

S.V.P. notez ma nouvelle adresse courriel ci-bas
Please note my new email address below

Roger Martineau, mv Ph.D.
Nutrition et M?tabolisme des ruminants
Centre de recherche et de d?veloppement
sur le bovin laitier et le porc
Agriculture et agroalimentaire Canada/Agriculture and Agri-Food Canada
T?l?phone/Telephone: 819-780-7319
T?l?copieur/Facsimile: 819-564-5507
2000, Rue Coll?ge / 2000, College Street
Sherbrooke (Qu?bec)  J1M 0C8
Canada
roger.martineau at canada.ca

Dear Yogev,

Since you use 'cluster=StudyID', cooks.distance() is doing 311 model fits.
But you use 'reestimate=FALSE', which should speed things up a lot. Also,
'sparse=TRUE' probably makes a lot of sense here, since the marginal
var-cov structure is probably quite sparse. So, for the most part, you are
already using features that should help to speed things up.

But a few things:

1) You used 'cluster = StudyID', but unless you used attach(Data) or have
'StudyID' as a separate object in your workspace, this should not work. It
should be 'cluster = Data$StudyID'.

2) If you use 'parallel="snow"', then no progress bar will be shown, so I
wonder how you got the '6%' then. Or did you run this once without
'parallel="snow"'?

3) If you use 'parallel="snow"', then this won't give you any speed
increase unless you actually make use of multiple cores. You can do this
with the 'ncpus' argument. But first check how many cores you actually have
available with parallel::detectCores() Note that this also counts 'logical'
cores. If you are on MacOS or Windows, then detectCores(logical=FALSE) is a
better indicator of how many cores to specify under 'ncpus'.

Best,
Wolfgang

-----Original Message-----
From: R-sig-meta-analysis [mailto:r-sig-meta-analysis-bounces using r-
project.org] On Behalf Of Yogev Kivity
Sent: Tuesday, 15 January, 2019 21:20
To: r-sig-meta-analysis using r-project.org
Subject: [R-meta] Influential case diagnostics in a multivariate
multilevel meta-analysis in metafor

Hi all,

I am fitting a multivariate multilevel meta-analysis in metafor and
having
trouble computing outlier and influential case diagnostics (i.e., cook?s
distances per
https://wviechtb.github.io/metafor/reference/influence.rma.mv.html).

This a large dataset of 3360 Pearson?s correlations (converted to
Fisher?s
z) nested within 600 subsamples that are nested within 311 studies. Below
is the code I used for the model and for computing Cook?s distances, and
the problem is that it takes it a lot of time to run (I ran it overnight
and it only reached 6%). I am assuming it is related to the size of the
dataset and to the complex model structure, but I am not sure how to go
about and speed up the processing. I should note that I am computing the
distances based on the simplest possible model (i.e., no moderators and
without considering dependencies among effect sizes within clusters).

I was hoping someone could help with some suggestions of how best to move
forward.

Thanks,
Yogev

NoMods <- rma.mv(yi, vi, random = ~ 1 | StudyID/GroupID/EffectSizeID,
data=Data,sparse=TRUE)
summary(NoMods)
NoModsCooksDistance <- cooks.distance(NoMods,progbar = T,cluster =
StudyID,
reestimate=FALSE,parallel="snow")
NoModsCooksDistance
plot(NoModsCooksDistance, type="o", pch=19)

--

Yogev Kivity, Ph.D.
Postdoctoral Fellow
Department of Psychology
The Pennsylvania State University
Bruce V. Moore Building
University Park, PA 16802
Office Phone: (814) 867-2330

Wolfgang Viechtbauer

Fri, Jan 18, 2019 9:15 AM #

Happy to hear that!

Best,
Wolfgang

-----Original Message-----
From: Yogev Kivity [mailto:yogev_k at yahoo.com] 
Sent: Friday, 18 January, 2019 16:18
To: Viechtbauer, Wolfgang (SP)
Cc: R-sig-meta-analysis at r-project.org
Subject: Re: [R-meta] Influential case diagnostics in a multivariate multilevel meta-analysis in metafor

Hi Wolfgang,

Using the latest 'devel' version of metafor worked! It took the computation about 10 minutes to run with 4 parallel cores (number of cores was indeed determined using the 'parallel' package).

Thanks for all your help!

Yogev
--
Yogev Kivity, Ph.D.?
Postdoctoral Fellow?
Department of Psychology?
The Pennsylvania State University?
Bruce V. Moore Building?
University Park, PA 16802?
Office Phone: (814) 867-2330

On Thu, Jan 17, 2019 at 5:16 PM Viechtbauer, Wolfgang (SP) <wolfgang.viechtbauer at maastrichtuniversity.nl> wrote:

Hi Yogev,

Just to be safe, make sure you are using the latest 'devel' version of metafor. Run devtools::install_github("wviechtb/metafor") to be sure. Also, I would go with whatever detectCores(logical=FALSE) tells you for the number of cores. But even without that, things should finish in a few minutes. Beyond that, I really don't know what the issue could be. It certainly isn't an issue with metafor per se.

Best,
Wolfgang

-----Original Message-----
From: Yogev Kivity [mailto:yogev_k at yahoo.com] 
Sent: Thursday, 17 January, 2019 21:37
To: Viechtbauer, Wolfgang (SP)
Cc: Martineau, Roger (AAFC/AAC); R-sig-meta-analysis at r-project.org
Subject: Re: [R-meta] Influential case diagnostics in a multivariate multilevel meta-analysis in metafor

Hi Wolfgang,

Thanks for your detailed reply and suggestions. Unfortunately, even after implementing your suggestions, I could not get the computation to terminate after letting it run for the night (with 4 logical cores).

I was going to suggest that perhaps the unbalanced dataset I am working with compared to the konstantopoulos2011 data has something to do with it (cluster size in my dataset ranges between 1 and 234 effect sizes with a mean of 11 and a median of 5). However, when I tried to run the konstantopoulos2011 code, I got similar running times for fitting the models (using standard BLAS), but I could not get the Cook?s distances computation to terminate even after 2050 seconds ? even when I used parallel processing with 4 logical cores. I used this code:

system.time(sav2 <- cooks.distance(res2, cluster=dat$group, reestimate=FALSE, parallel="snow", ncpus=4))

Any thoughts?

Thanks,
Yogev