[R-meta] Clarification on an answer
I compared the sum of the number of studies (or number of publications) across levels of the moderator to the total number of studies (or publications). The difference is the number of studies (or publications) that include both levels.
On Mon, Feb 13, 2023 at 8:16 PM Yuhang Hu <yh342 at nau.edu> wrote:
Thank you, James, for the clarification. Regarding my second question, I was particularly interested in understanding how to determine the number of publications and their nested number of studies that have both levels from the output (likely from "nlvls" below), not the raw data? Thank you, Yuhang Full Model:
estim sqrt nlvls fixed factor sigma^2.1 0.0346 0.1861 96 no id sigma^2.2 0.0000 0.0000 138 no id/study sigma^2.3 0.0356 0.1887 302 no id/study/esid
Subgroup1:
estim sqrt nlvls fixed factor sigma^2.1 0.0401 0.2002 67 no id sigma^2.2 0.0000 0.0000 93 no id/study sigma^2.3 0.0459 0.2142 215 no id/study/esid
Subgroup2:
estim sqrt nlvls fixed factor sigma^2.1 0.0294 0.1713 33 no id sigma^2.2 0.0000 0.0000 54 no id/study sigma^2.3 0.0116 0.1077 87 no id/study/esid
On Mon, Feb 13, 2023 at 3:43 PM James Pustejovsky <jepusto at gmail.com> wrote:
Hi Yuhang, Thanks for your questions. On your first question, yes you are correct it should be DIAG DIAG (diagonal vcov matrices) rather than ID ID (identity matrices). On your second question, I think the easiest approach might be to just count. For a given level of units (studies or samples), summarize of the number of effects included in each unit by each value of the moderator. Then count the number of units that have both values of the moderator (or more generally, have multiple values of the moderator). Section 4.1 of the supplementary materials for Pustejovsky & Tipton (2022) has some examples of summary tables that get at this question. Supplementary materials pdf: https://osf.io/nyv4u Code for supplementary materials: https://osf.io/mahc2 James On Mon, Feb 13, 2023 at 3:49 PM Yuhang Hu via R-sig-meta-analysis < r-sig-meta-analysis at r-project.org> wrote:
Dear Meta-analysis Experts,
I'm hoping to use the forwarded answer below in my model but need two
clarifications about the answer.
First, bullet point # 3, says that we fit the below model and then
"compare
this model against the model that assumes homogeneous variance components
across subgroups."
rma.mv(yi, V = Vsub,
mods = ~ tc_fac - 1,
random = list(~ tc_fac | id, ~ tc_fac | id:esid), struct =
c("ID","ID"),
data = cc_d, method = "REML")
But the above model already assumes homogeneous variance components
across
subgroups, so I wonder if the intention was actually to set struct =
c("DIAG","DIAG")?
Second, the answer mentions "Based on the reported counts from your
output,
it looks like there might be four publications (reporting a total of nine
studies) that have both levels."
I was wondering how to obtain the number of publications and their nested
number of studies that have both levels (so I can apply this to my
model)?
Thank you for your time,
Yuhang
---------- Forwarded message ---------
From: James Pustejovsky via R-sig-meta-analysis
<r-sig-meta-analysis at r-project.org>
Date: Thu, Feb 9, 2023 at 8:14 PM
Subject: Re: [R-meta] Different estimates for multilevel meta-analysis
when using rma.mv with mods or subset
To: R Special Interest Group for Meta-Analysis
<r-sig-meta-analysis at r-project.org>
Hi Janis,
There are potentially two things going on here. First, when estimating
separate models within each subgroup, you allow the heterogeneity
(variance
components) to differ by subgroup. This in turn leads to a different
weighting of the individual effect size estimates in the calculation of
the
overall average effect sizes than the weighting used in the moderator
analysis. That difference in weighting might be enough to account for the
swings in the average ES for each category. To determine which model is
more appropriate, you could use fit statistics or a likelihood ratio
test.
To get such information, you'll need to find a way to express the
subgroup
analyses in terms of a single model. You can do this as follows:
1. Make a factor variable from targetcommitment so that you don't have to
repeatedly calculate it:
cc_d$tc_fac <- factor(cc_d$targetcommitment)
2. If you created the V matrix using metafor::vcalc(), set the subgroup
argument to the moderator variable:
Vsub <- vcalc(..., subgroup = cc_d$tc_fac, ...)
3. Fit a model using this new V matrix, specifying a random effects
structure that allows independent effects in each subgroup:
rma.mv(yi, V = Vsub,
mods = ~ tc_fac - 1,
random = list(~ tc_fac | id, ~ tc_fac | id:esid), struct =
c("ID","ID"),
data = cc_d, method = "REML")
Note that I've omitted the middle level of random effects because there
doesn't seem to be any variance there after accounting for the id and
esid
levels. The results of (3) should be identical (or nearly so) to the
results from the subgroup analysis. But now they're embedded within one
model, so you can get fit statistics or conduct a likelihood ratio test
to
compare this model against the model that assumes homogeneous variance
components across subgroups.
The second thing that might be going on is that your data seems to
include
a few publications that have both levels of the targetcommitment
variable.
Based on the reported counts from your output, it looks like there might
be
four publications (reporting a total of nine studies) that have both
levels. Because of this structure, the first model you use for moderator
analysis will calculated average effect size estimates for each level of
targetcommitment based in part on the effect size estimates *for the
other
category* from the four publications / 9 studies that include both
levels.
This has the effect of moving the averages towards each other---that is
-.175 and .241 get pulled inward to -.155 and .188.
If you're particularly interested in figuring out the difference between
levels of targetcommitment, the question is then whether this "pulling
in"
is a reasonable thing to do. You could consider this on a conceptual
level:
is it reasonable to put special emphasis on the results from these four
publications in estimating the difference between levels of
targetcommitment? If there's not a clear conceptual rationale, then you
might consider fitting a slightly different model, which includes the
study-mean variable and the study-mean-centered variable as separate
terms.
For targetcommitment, this would mean calculating the average of the
dummy
variable (targetcommitment == 2) for each study (this gives you the first
predictor, call it tc_study) and then subtracting the average from the
original dummy variable (this gives you the second predictor, call it
tc_within). The model should then use the moderators:
~ 1 + tc_study + tc_within
Tanner-Smith and Tipton (2014; https://doi.org/10.1002/jrsm.1091)
recommend
this as a generic strategy for meta-regression of dependent effect sizes.
James
On Wed, Feb 8, 2023 at 2:40 PM Janis Zickfeld via R-sig-meta-analysis <
r-sig-meta-analysis at r-project.org> wrote:
Hi all, my question has already been raised before (e.g.,
),
but the answers did not help me understanding the main problem or
solve differences in the output, so that is why I'm posting it again.
We are conducting a meta-analysis and have effect sizes ("esid")
nested within studies or samples ("study") nested within publications
("id"). There are 344 effect sizes in total across 110 publications
and 160 study - publication combinations. There are 48.75% of studies
including more than one effect size and of these 26.82% are dependent
effects because they either use the same sample or the same control
treatment used to calculate the effect. Therefore, we have constructed
an approximate variance-covariance matrix to account for this. Some of
the moderators only use a subset of this data, as they have missing
values (as in the example below).
My main problem is now that I get different estimates when running the
rma.mv model with factorial moderators using 'mods' or when subsetting
them via the 'subset' command. I have seen this discussed here
(
)
and it seems that the main difference is that 'mods' uses the same
residual heterogeneity, whereas 'subset' allows for different levels
of tau^2. However, this example does not discuss a multilevel
meta-analysis.
When running 'mods' using the following syntax:
rma.mv(yi, V, random = ~ 1 | id/study/esid, data = cc_d, method =
"REML", mods = ~factor(target_commitment) - 1)
I get:
Multivariate Meta-Analysis Model (k = 302; method: REML)
Variance Components:
estim sqrt nlvls fixed factor
sigma^2.1 0.0346 0.1861 96 no id
sigma^2.2 0.0000 0.0000 138 no id/study
sigma^2.3 0.0356 0.1887 302 no id/study/esid
Test for Residual Heterogeneity:
QE(df = 300) = 2213.7041, p-val < .0001
Test of Moderators (coefficients 1:2):
QM(df = 2) = 54.1962, p-val < .0001
Model Results:
estimate se
zval pval ci.lb ci.ub
factor(target_commitment)1 -0.1551 0.0315 -4.9258 <.0001 -0.2169
-0.0934 ***
factor(target_commitment)5 0.1878 0.0420 4.4770 <.0001 0.1056
0.2701 ***
However, running:
rma.mv(yi, V, random = ~ 1 | id/study/ID2, data = cc_d, method =
"REML", subset=target_commitment==1)
I obtain:
Multivariate Meta-Analysis Model (k = 215; method: REML)
Variance Components:
estim sqrt nlvls fixed factor
sigma^2.1 0.0401 0.2002 67 no id
sigma^2.2 0.0000 0.0000 93 no id/study
sigma^2.3 0.0459 0.2142 215 no id/study/esid
Test for Heterogeneity:
Q(df = 214) = 1876.2588, p-val < .0001
Model Results:
estimate se tval
df pval ci.lb ci.ub
targetcommitment1 -0.1747 0.0352 -4.9607 214 <.0001 -0.2441
-0.1053
***
Multivariate Meta-Analysis Model (k = 87; method: REML)
Variance Components:
estim sqrt nlvls fixed factor
sigma^2.1 0.0294 0.1713 33 no id
sigma^2.2 0.0000 0.0000 54 no id/study
sigma^2.3 0.0116 0.1077 87 no id/study/esid
Test for Heterogeneity:
Q(df = 86) = 302.2812, p-val < .0001
Model Results:
estimate se tval df pval
ci.lb ci.ub
targetcommitment5 0.2407 0.0397 6.0606 86 <.0001 0.1618 0.3197
***
So there is a difference between -.155 and -.175, as well as .188 and .241. These are not the biggest differences, but I have other moderators for which the differences are stronger. I tried to follow the suggestions of one of the previous questions (
), but this didn't reduce the difference for me. My main question would be which of the two approaches would be the preferable one (and if there is any empirical basis for this)? Is it more a choice of preference here or is one approach superior in the present case? I could also post a link to the data if that would be helpful for reproducing the findings. I apologize for double posting this issue and hope that maybe someone can clarify which option I should choose. Best wishes, Janis
_______________________________________________ R-sig-meta-analysis mailing list @ R-sig-meta-analysis at r-project.org To manage your subscription to this mailing list, go to: https://stat.ethz.ch/mailman/listinfo/r-sig-meta-analysis
[[alternative HTML version deleted]]
_______________________________________________ R-sig-meta-analysis mailing list @ R-sig-meta-analysis at r-project.org To manage your subscription to this mailing list, go to: https://stat.ethz.ch/mailman/listinfo/r-sig-meta-analysis
-- Yuhang Hu (She/Her/Hers) Ph.D. Student in Applied Linguistics Department of English Northern Arizona University