Hi Filippo,
To add to Wolfgang's notes, here is a blog post where I walk through
simulating a meta-analysis of standardized mean differences:
https://www.jepusto.com/simulating-correlated-smds/
It covers a more general case than the 3-level meta-analysis, in which
effect size estimates from a given study may be correlated at the sampling
level, so it might be overkill for what you're doing.
With Wolfgang's notes, your code is a reasonable way to simulate a
three-level meta-analysis, especially if the purpose of your simulation is
exploratory or pedagogical. There are a few ways that I think it could be
improved in terms of its realism. These might be important if the
simulation is for methodological research--as David Hoaglin (2015;
https://doi.org/10.1002/jrsm.1146) argues, for research simulations, it's
important to consider the ways in which the meta-analysis model is a
simplification or idealization of plausible data-generating models. So, in
that spirit, here are a few suggestions:
* Drawing the standard errors from a uniform distribution on (0, 0.3)
implies a certain distribution of study sizes, which might or might not be
plausible for the context that you're thinking about. For a meta-analysis
of group differences with equally sized groups, a standard error of 0.3
implies a total sample size of about 44. Using a uniform distribution
implies that you will have many quite large studies:
study_sizes <- 4 / runif(1000, 0, 0.3)^2
summary(study_sizes)
Min. 1st Qu. Median Mean 3rd Qu. Max.
45 77 170 22732 830 11996245
An alternative approach would be to draw the standard errors from the
empirical distribution of standard errors in a previous conducted synthesis
that resembles the context you're interested in. Or, if there are studies
on the distribution of sample sizes or typical power levels of studies in
your area, you could use that to find a more realistic distribution of
sample sizes/standard errors from which to simulate.
* Your simulation involves generating a structure where every study has
the same number of effect sizes. This "balanced" structure is a best-case
scenario for the performance of the 3-level meta-analysis model. Simulating
studies with varying numbers of effect sizes would be more realistic and
might be important for performance. A simple way to do that is just
studies_per_paper <- sample(1:10, size = ni, replace = TRUE)
dat <- data.frame(paper = rep(1:ni, studies_per_paper), study =
1:sum(studies_per_paper))
table(dat$paper)
One step better would be to use a distribution that is motivated by
empirical data. You could do this by looking at the distribution of effect
sizes per study from one or more previously conducted syntheses that have
open data.
* Your simulation generates effect size estimates that are exactly
normally distributed. For most effect size metrics, this is a
simplification that works reasonably well when sample sizes are large but
less well when sample sizes are small, and which sometimes also depends on
other features of the studies (e.g., for odds ratios, the adequacy of the
normality approximation depends on the number of positive outcomes, not
just the sample size). If you're in a context with small studies or effect
sizes that aren't truly normally distributed, it would be more realistic to
generate effect size estimates following a three-step process:
1) simulate the true effect size parameters by adding up the random
effects at the paper level and study level;
2) simulate any other relevant study features (sample sizes, event rates
in the control condition); and then
3) simulate the effect size estimates based on a realistic data-generating
process, such as by sampling from a t distribution for standardized mean
differences or sampling from binomial distributions for log-odds ratios.
The blog post linked above demonstrates this process for correlated
standardized mean difference effect sizes.
Of course, this might be way more technical detail and way more nuance
than you need. I offer it just in case it's useful to you or others working
on simulations.
James
On Thu, Jun 16, 2022 at 3:11 AM Filippo Gambarota <
filippo.gambarota at gmail.com> wrote:
Thank you Wolfgang this is very helpful.
With that sentence about sampling variances, I meant that (if I
remember correctly) for tau estimation I have to use effect size weights
(thus sampling variances) so I was afraid that I need to sample values
according to some criteria (and not runif(...) as in my simulation) in
order to recover my parameters.
On Wed, 15 Jun 2022 at 19:36, Viechtbauer, Wolfgang (NP) <
wolfgang.viechtbauer at maastrichtuniversity.nl> wrote:
Well, it doesn't want to go through, at least so far both of my replies
are not showing up on the mailing list. No idea why. In case you reply,
please still include the mailing list.
Best,
Wolfgang
-----Original Message-----
From: Viechtbauer, Wolfgang (NP)
Sent: Wednesday, 15 June, 2022 17:33
To: R meta
Cc: 'Filippo Gambarota'
Subject: RE: [R-meta] simulating three-level meta-analysis
(I don't know why, but occasionally my posts to the mailing list do
through. This seems to be happening again, so let me try to resend
Hi Filippo,
Not sure what you mean by "sampling variances that will be associated
certain value of tau2 (level 2) and tau3 (level 3)". But in any case,
isn't quite simulating data according to the three level model. This
dat <- data.frame(paper = rep(1:ni, each=nj), study = 1:nj)
dat$b0_i <- rep(rnorm(ni, 0, taui), each=nj)
dat$b0_j <- rnorm(nrow(dat), 0, tauj)
dat$vi <- runif(nrow(dat), 0, 0.3)
dat$yi <- b0 + dat$b0_i + dat$b0_j + rnorm(nrow(dat), 0,
fit <- rma.mv(yi, vi, random = ~ study | paper, data = dat)
fit
Best,
Wolfgang
-----Original Message-----
From: R-sig-meta-analysis [mailto:
r-sig-meta-analysis-bounces at r-project.org] On
Behalf Of Filippo Gambarota
Sent: Wednesday, 15 June, 2022 13:44
To: R meta
Subject: [R-meta] simulating three-level meta-analysis
Hello!
I'm trying to simulate a three-level meta-analysis model but I'm stuck
on understanding how to generate sampling variances that will be
associated with a certain value of tau2 (level 2) and tau3 (level 3).
My approach so far is setting values for population parameters and
generating sampling variances for the effect ij:
```
library(metafor)
b0 <- 0.3 # effect
ni <- 100 # number of papers
nj <- 5 # number of effects within each paper
taui <- 0.3 # level 3 tau
tauj <- 0.1 # level 2 tau
icc <- taui^2 / sum(taui^2 + tauj^2) # real ICC
# i-level data
dati <- data.frame(paper = 1:ni,
b0_i = rnorm(ni, 0, taui))
# j-level data
datj <- data.frame(paper = rep(1:ni, nj),
study = 1:sum(nj),
b0_j = rnorm(sum(nj), 0, tauj))
dat <- merge(dati, datj, by = "paper") # combine
dat$vi <- runif(nrow(dat), 0, 0.3) # sampling variances at level 1
dat$yi <- b0 + dat$b0_i + dat$b0_j + rnorm(nrow(dat), 0, dat$vi)
fit <- rma.mv(yi, vi, random = ~study|paper, data = dat)
```
However, I'm sure that I'm doing wrong because taus are estimated
using vi and the vis are not sampled with specific criteria.
Does someone have some hints? Or in general suggestions about to setup
a simulation like this?
Thanks!
--
Filippo Gambarota
PhD Student - University of Padova
Department of Developmental and Social Psychology
Website: filippogambarota
Research Group: Colab Psicostat