Skip to content

[R-meta] Covariance-variance matrix when studies share multiple treatment x control comparison

16 messages · Ju Lee, James Pustejovsky, Wolfgang Viechtbauer

#
Dear Wolfgang and all,

I am running a multivariate mixed effect models using rma.mv and trying to build a variance-covariance matrix to account for inter-dependence rising from shared experimental units.

The issue I have is: what if my analysis includes studies where both controls and treatment groups are repeatedly used to calculate effect sizes (this was because each comparison produces different categorical comparison that are meaningful, but I am also trying to pool all studies to calculate the grand mean effect sizes ).

If I simplify my dataframe, it looks like below. here m1i, m2i are means for treatment and controls and n1i and n2i are sample sizes needed for constructing cov-var matrix.

study   treatment       m1i     m2i     n1i     n2i
1       1       7.87    -1.36   25      25
1       2       4.35    -1.36   22      25

1       2       4.35    7.87    22      25

2       1       9.32    0.98    38      40

3       1       8.08    1.17    50      50

4       1       7.44    0.45    30      30

4       2       5.34    0.45    30      30

If you look at study 1, all three effect sizes share different subset of experimental group. Based on Wolfgang's code, I am trying to construct Cov-var matrix using following code:

calc.v <- function(x) {
  v <- matrix(1/x$n2i[1] + outer(x$yi, x$yi, "*")/(2*x$Ni[1]), nrow=nrow(x), ncol=nrow(x))
  diag(v) <- x$vi
  v
}
V <- bldiag(lapply(split(dat, dat$study), calc.v))
V

But I am not sure how I can proceed here because all three effect sizes should be interdependent due to sharing some experimental groups, but how can we specify this in the matrix? Especially, between first and third response in the first study, mean of 7.87 is treatment in the first but control in the third response. How can we reasonably account for inter-dependence in such a case?

Thank you very much,
Best,
JU
#
Dear Ju,

There are two contrasts in study 1, 7.87 vs -1.36 and 4.35 vs -1.36, so the -1.36 comes from a common reference group. So, there will be two effect sizes for that study (and due to reuse of the reference group data, there is indeed dependence between the two effect sizes). Including the 7.87 vs 4.35 contrast is redundant. All the information about this contrast is already contained in the two contrasts above.

You haven't said what kind of effect size measure you are using for your analysis, but the code you posted comes from an example that uses the standardized mean difference as the measure (http://www.metafor-project.org/doku.php/analyses:gleser2009) -- and it only applies if this is the measure you are also using. You did not post any SDs (which would be needed if you are working with SMDs), maybe just for brevity, but I just wanted to mention this.

Best,
Wolfgang 

-----Original Message-----
From: R-sig-meta-analysis [mailto:r-sig-meta-analysis-bounces at r-project.org] On Behalf Of Ju Lee
Sent: Wednesday, 18 September, 2019 19:44
To: r-sig-meta-analysis at r-project.org
Subject: [R-meta] Covariance-variance matrix when studies share multiple treatment x control comparison

Dear Wolfgang and all,

I am running a multivariate mixed effect models using rma.mv and trying to build a variance-covariance matrix to account for inter-dependence rising from shared experimental units.

The issue I have is: what if my analysis includes studies where both controls and treatment groups are repeatedly used to calculate effect sizes (this was because each comparison produces different categorical comparison that are meaningful, but I am also trying to pool all studies to calculate the grand mean effect sizes ).

If I simplify my dataframe, it looks like below. here m1i, m2i are means for treatment and controls and n1i and n2i are sample sizes needed for constructing cov-var matrix.

study   treatment       m1i     m2i     n1i     n2i
1       1               7.87    -1.36   25      25
1       2               4.35    -1.36   22      25

1       2               4.35    7.87    22      25

2       1               9.32    0.98    38      40

3       1               8.08    1.17    50      50

4       1               7.44    0.45    30      30

4       2               5.34    0.45    30      30

If you look at study 1, all three effect sizes share different subset of experimental group. Based on Wolfgang's code, I am trying to construct Cov-var matrix using following code:

calc.v <- function(x) {
  v <- matrix(1/x$n2i[1] + outer(x$yi, x$yi, "*")/(2*x$Ni[1]), nrow=nrow(x), ncol=nrow(x))
  diag(v) <- x$vi
  v
}
V <- bldiag(lapply(split(dat, dat$study), calc.v))
V

But I am not sure how I can proceed here because all three effect sizes should be interdependent due to sharing some experimental groups, but how can we specify this in the matrix? Especially, between first and third response in the first study, mean of 7.87 is treatment in the first but control in the third response. How can we reasonably account for inter-dependence in such a case?

Thank you very much,
Best,
JU
#
Dear Wolfgang,

Thank you very much for your response. I am using Hedges' d and I assumed this code applies to all SMD type of ES including Hedges' d?

I am still little baffled how I can apply this to more complex combinations. Below table looks busy, but I would appreciate if you can answer the following two questions. FYI, C_ID1 is the shared group ID.

1) In study 1, treatment 0.84 is shared in 3th and 4th response. To account for this inter-dependence, should I use "nt"  instead of "nc" for calculating cov.d matrix?

2) According to your feedback, does my C_ID1 (sharing group common ID) appropriately specified in study for constructing cov. matrix? I initially thought all 6 responses should have same control ID, as they are all inter-related with either shared t or c group (except for 5th response of study 2 which has treatment 0.7 that is not shared by others)

Study   mt      mc      nt      nc      C_ID1   hedges.d        var
1       0.92    0.68    6       6       c1      0.937   0.370
1       0.69    0.68    9       6       c1      0.038   0.278
1       0.84    0.69    9       9       c2      0.659   0.234
1       0.84    0.92    9       6       c2      -0.623  0.291
2       0.610   0.322   50      31      c3      0.760   0.056
2       0.667   0.322   32      31      c3      0.885   0.070
2       0.610   0.528   50      32      c4      0.209   0.052
2       0.667   0.528   32      32      c4      0.345   0.063
2       0.700   0.536   26      31      c5      0.417   0.072
2       0.667   0.536   32      31      c5      0.316   0.064

I hope I am not asking too much of your time, but if you can answer this specific question, it will clarify all the doubts I have with this stage of analysis.

Thank you very much, and I hope to hear from you.

Best wishes,
JU
#
Dear Wolfgang and all,

Please ignore my previous e-mail,  and refer to this one instead:

Thank you very much for your response. I am using Hedges' d and I assumed this code applies to all SMD type of ES including Hedges' d?

I am still little baffled how I can apply this to more complex combinations. Below table looks busy, but I would appreciate if you can answer the following two questions. FYI, C_ID1 is the shared group ID.

1) In study 1, treatment 0.84 is shared in 3th and 4th response. To account for this inter-dependence, should I use "nt"  instead of "nc" for calculating cov.d matrix?

2) I am not sure how to proceed if treatments and controls are shared in fashion as study 2. Clearly, 1st/2nd response shares treatment (1.00) and 2nd/3rd shares  control (2.00), but 1st and 3rd do not share any experimental unit. I've categorized 2nd and 3rd with same C_ID1 because they share control, but am unclear if I should assign 1st response the same ID as 2nd and 3rd (c3) since it is still technically not independent from 2nd response (shared control) OR do I assign it a complete different control ID?

Study   mt     mc     nt    nc   C_ID1   hedges.d    var
1       0.92    0.68    6       6       c1      0.937       0.370
1       0.69    0.68    9       6       c1      0.038       0.278
1       0.84    0.69    9       9       c2      0.659       0.234
1       0.84    0.92    9       6       c2      -0.623      0.291
2       1.00   0.32    10     20      c?     0.760       0.056
2       1.00   2.00    10      5       c3     0.885       0.070
2       0.61   2.00    15      5       c3      0.209       0.052

I hope I am not asking too much of your time, but if you can answer this specific question, it will clarify all the doubts I have with this stage of analysis.

Thank you very much, and I hope to hear from you.

Best wishes,
JU
5 days later
#
Hi Ju,

The equation for the covariances applies to the standardized mean difference for two independent groups (so, what one could call 'classical' Cohen's d or Hedges' g if one uses the bias correction). It does not apply to other types of standardized mean differences, such as those for dependent/paired groups. Also, the equation assumes that the pooled variance 'sdp' (that is used for computing pairwise SMDs, such as (m1 - m3) / sdp and (m2 - m3) / sdp for three groups, with group 3 being the reference group) has been computed by pooling the SDs of all three groups. Doing so isn't standard practice and takes some extra effort to implement.

Just a sidenote: When pasting tables like this, please use a fixed width font (and don't use tabs). Without that, the formating is all off and makes the table hard to read.

Even study 1 is much more complicated. It seems that rows 1 and 2 share a common group (mc=0.68 with nc=6). Then rows 3 and 4 share a common group (mt=0.84 with nt=9). But rows 1 and 4 and 2 and 3 also share a common group each. So, the var-cov matrix for this study would look like this:

[ 0.370 cov12     0 cov14 ]
[       0.278 cov23     0 ]
[             0.234 cov34 ]
[                   0.291 ]

In study 2, it would be:

[ 0.056 cov12     0 ]
[       0.070 cov23 ]
[             0.052 ]

Leaving aside the issue of whether SDs are getting pooled across all groups within a study or not, the covariance terms should be computable with the equation (19.19) from Gleser and Olkin (2009). I am not 100% sure though if the fact that the reference group is sometimes T and sometimes C matters. In any case, you should treat the group (whether it is actually a 'treatment' or a 'control' group as the 'control' group in equation (19.19)).

As for your ID variable and how to code it -- a single ID variable cannot capture these complexities. You will need 2 ID variables, indicating which group is the T and which is the C group. So, for study 1, it would be:

Study   mt      mc      nt      nc      TID CID    hedges.d    var
1       0.92    0.68    6       6       1   3      0.937       0.370
1       0.69    0.68    9       6       2   3      0.038       0.278
1       0.84    0.69    9       9       4   2      0.659       0.234
1       0.84    0.92    9       6       4   1     -0.623      0.291

Best,
Wolfgang

-----Original Message-----
From: Ju Lee [mailto:juhyung2 at stanford.edu] 
Sent: Thursday, 19 September, 2019 1:24
To: Viechtbauer, Wolfgang (SP); r-sig-meta-analysis at r-project.org
Subject: Re: Covariance-variance matrix when studies share multiple treatment x control comparison

Dear Wolfgang and all,

Please ignore my previous e-mail,? and refer to this one instead:

Thank you very much for your response. I am using Hedges' d and I assumed this code applies to all SMD type of ES including Hedges' d?

I am still little baffled how I can apply this to more complex combinations. Below table looks busy, but I would appreciate if you can answer the following two questions. FYI, C_ID1 is the shared group ID.

1) In study 1, treatment 0.84 is shared in 3th and 4th response. To account for this inter-dependence, should I use "nt"? instead of "nc" for calculating cov.d matrix?

2) I am not sure how to proceed if treatments and controls are shared in fashion as study 2. Clearly, 1st/2nd response shares treatment (1.00) and 2nd/3rd shares? control (2.00), but 1st and 3rd do not share any experimental unit. I've categorized 2nd and 3rd with same C_ID1 because they share control, but am unclear if I should assign 1st response the same ID as 2nd and 3rd (c3) since it is still technically not independent from 2nd response (shared control) OR do I?assign it a complete different control ID?

Study?? mt? ? ?mc? ? ?nt? ? nc? ?C_ID1?? hedges.d? ? var
1?????? 0.92??? 0.68??? 6?????? 6?????? c1????? 0.937? ? ? ?0.370
1?????? 0.69??? 0.68??? 9?????? 6?????? c1????? 0.038? ? ? ?0.278
1?????? 0.84??? 0.69??? 9?????? 9?????? c2????? 0.659? ? ? ?0.234
1?????? 0.84??? 0.92??? 9?????? 6?????? c2????? -0.623? ? ? 0.291
2?????? 1.00? ?0.32? ? 10? ? ?20? ? ? c?? ? ?0.760? ? ? ?0.056
2?????? 1.00? ?2.00? ? 10? ? ? 5? ? ? ?c3 ? ? 0.885? ? ? ?0.070
2?????? 0.61? ?2.00? ? 15? ? ? 5? ? ? ?c3? ? ? 0.209? ? ? ?0.052

I hope I am not asking too much of your time, but if you can answer this specific question, it will clarify all the doubts I have with this stage of analysis.

Thank you very much, and I hope to hear from you.

Best wishes,
JU
1 day later
#
Dear Wolfgang,

I deeply appreciate your time looking into this issue, and this has been immensely helpful.
I was able to incorporate all possible inter-dependence among effect sizes by adding different layers of non-independence to our dataframe.

I manually calculated hedges'd based on based on Hedges and Olkin (1985), and it generates exactly same value as hedges' g in escalc() "SMD" function. So hopefully I am doing everything right using the equation we've discussed earlier.

I have been also wondering if it is possible to account of this variance-covariance structure that I've constructed when running publication bias analysis, for example, when using fsn() function or modified egger's regression test (looking at intercept term of residual ~ precision meta-regression using rma.mv). I had no luck so far finding information on this, and I would appreciate if you have any suggestions related to this

Thank you for all of your valuable helps!
Best regards,
JU
#
Hi Ju,

Glad to hear that you are making progress. Construction of the V matrix can be a rather tedious process and often requires quite a bit of manual work.

I have little interested in generalizing fsn() for cases where V is not diagonal, because fsn() is more of interest for historical reasons, not something I would generally use in applied work.

However, the 'Egger regression' test can be easily generalized to rma.mv() models. Simply include a measure of the precision (e.g., the standard error) of the estimates in your model as a predictor/moderator and then you have essentially a multilevel/multivariate version thereof (you would then look at the test of the coefficient for the measure of precision, not the intercept).

I also recently heard a talk by Melissa Rodgers and James Pustejovsky (who is a frequent contributor to this mailing list) on some work in this area. Maybe he can chime in here.

Best,
Wolfgang

-----Original Message-----
From: Ju Lee [mailto:juhyung2 at stanford.edu] 
Sent: Thursday, 26 September, 2019 8:13
To: Viechtbauer, Wolfgang (SP); r-sig-meta-analysis at r-project.org
Subject: Re: Covariance-variance matrix when studies share multiple treatment x control comparison

Dear Wolfgang,

I deeply appreciate your time looking into this issue, and this has been immensely helpful.
I was able to incorporate all possible inter-dependence among effect sizes by adding different layers of non-independence to our dataframe.?

I manually calculated hedges'd based on based on Hedges and Olkin (1985), and it generates exactly same value as hedges' g in escalc() "SMD" function. So hopefully I am doing everything right using the equation we've discussed earlier.

I have been also wondering if it is possible to account of this variance-covariance structure that I've constructed when running publication bias analysis, for example, when using fsn() function or modified egger's regression test (looking at intercept term of residual ~ precision meta-regression using rma.mv). I had no luck so far finding information on this, and I would appreciate if you have any suggestions related to this

Thank you for all of your valuable helps!
Best regards,
JU
#
Dear Wolfgang,

Thank you for your response and sorry I forgot to CC the mailing list!
I am currently running my egger's regression test as shown below. My previous understanding was that I should look at the p-value of intercept term (following a previously published R code) if I run a "mixed" model using precision as moderator variable against residuals, but according to your comments I should be looking at the precision coefficients instead? So based on my outputs below, significance testing of plot asymmetry is at p=0.09 and not p=0.3823?

Also, if I find significant violation of plot asymmetry in such case what additional options do I have to test these issues? I am currently calculating FSN which are extremely higher than proposed thresholds and removing influential outliers and re-fitting the model. But because rma.mv does not allow me to use other methods like trim and fill I wonder if these two other methods would be enough in case we detect plot asymmetry.

Thank you for your time to answer these many questions.
Best regards,
JU
Multivariate Meta-Analysis Model (k = 857; method: REML)

Variance Components:

  estim    sqrt  nlvls  fixed  factor
sigma^2    0.9929  0.9964    182     no   Study

Test for Residual Heterogeneity:
  QE(df = 855) = 4106.3487, p-val < .0001

Test of Moderators (coefficient(s) 2):
  QM(df = 1) = 2.7267, p-val = 0.0987

Model Results:

  estimate      se     zval    pval    ci.lb   ci.ub
intrcpt      0.0817  0.0936   0.8727  0.3828  -0.1017  0.2651
precision   -0.0392  0.0238  -1.6513  0.0987  -0.0858  0.0073  .

---
  Signif. codes:  0 ?***? 0.001 ?**? 0.01 ?*? 0.05 ?.? 0.1 ? ? 1
#
Hi Ju,

1) As has been discussed on this mailing list on a few occasions: In addition to adding 'Study' as random effects, you should also add random effects for the estimates within studies. So, your 'base' model should be:

MHF$Id <- 1:nrow(MHF)
rma.mv(hedged, var, method="REML", random = ~ 1 | Study/Id, data=MHF)

(otherwise, you are assuming no heterogeneity within studies, which is a very strong assumption)

2) Your 'Egger-like multilevel regression test' model would simply be:

rma.mv(hedged, var, mods = ~ sqrt(var), method="REML", random = ~ 1 | Study/Id, data=MHF)

You then test if sqrt(var) is a significant predictor or not.

3) But 'var' should be a matrix -- after all, that was the point of the earlier discussion. So, if it is, then it would be:

rma.mv(hedged, var, mods = ~ sqrt(diag(var)), method="REML", random = ~ 1 | Study/Id, data=MHF)

Best,
Wolfgang

-----Original Message-----
From: Ju Lee [mailto:juhyung2 at stanford.edu] 
Sent: Thursday, 26 September, 2019 14:58
To: Viechtbauer, Wolfgang (SP); r-sig-meta-analysis at r-project.org; James Pustejovsky (jepusto at gmail.com)
Subject: Re: Covariance-variance matrix when studies share multiple treatment x control comparison

Dear Wolfgang,

Thank you for your response and sorry I forgot to CC the mailing list!
I am currently running my egger's regression test as shown below. My previous understanding was that I should look at the p-value of intercept term (following a previously published R code) if I run a "mixed" model using precision as moderator variable against residuals, but according to your comments I should be looking at the precision coefficients instead? So based on my outputs below, significance testing of plot asymmetry is at p=0.09 and not p=0.3823?

Also, if I find significant violation of plot asymmetry in such case what additional options do I have to test these issues? I am currently calculating FSN which are extremely higher than proposed thresholds and removing influential outliers and re-fitting the model. But because rma.mv does not allow me to use other methods like trim and fill I wonder if these two other methods would be enough in case we detect plot asymmetry.

Thank you for your time to answer these many questions.
Best regards,
JU
Multivariate Meta-Analysis Model (k = 857; method: REML)

Variance Components:
?
? estim ? ?sqrt ?nlvls ?fixed ?factor
sigma^2 ? ?0.9929 ?0.9964 ? ?182 ? ? no ? Study

Test for Residual Heterogeneity:
? QE(df = 855) = 4106.3487, p-val < .0001

Test of Moderators (coefficient(s) 2):
? QM(df = 1) = 2.7267, p-val = 0.0987

Model Results:
?
? estimate ? ? ?se ? ? zval ? ?pval ? ?ci.lb ? ci.ub ?
intrcpt ? ? ?0.0817 ?0.0936 ? 0.8727 ?0.3828??-0.1017 ?0.2651 ?
precision ? -0.0392 ?0.0238 ?-1.6513 ?0.0987 ?-0.0858 ?0.0073 ?.

---
? Signif. codes: ?0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#
Ju,

Following up on Wolfgang's comment: yes, adding a measure of precision as a
predictor in the multi-level/multi-variate meta-regression model should
work. Dr. Belen Fernandez-Castilla has a recent paper that reports a
simulation study evaluating this approach. See

Fern?ndez-Castilla, B., Declercq, L., Jamshidi, L., Beretvas, S. N.,
Onghena, P., & Van den Noortgate, W. (2019). Detecting selection bias in
meta-analyses with multiple outcomes: A simulation study. The Journal of
Experimental Education, 1?20.

However, for standardized mean differences based on simple between-group
comparisons, it is better to use sqrt(1 / n1 + 1 / n2) as the measure of
precision, rather than using the usual SE of d. The reason is that the SE
of d is naturally correlated with d even in the absence of selective
reporting, and so the type I error rate of Egger's regression test is
artificially inflated if the SE is used as the predictor. Using the
modified predictor as given above fixes this issue and yields a correctly
calibrated test. For all the gory details, see Pustejovsky & Rodgers (2019;
https://doi.org/10.1002/jrsm.1332).

It's also possible to combine all of the above with robust variance
estimation, or to use a simplified model plus robust variance estimation to
account for dependency between effect sizes from the same study. Melissa
Rodgers and I have a working paper showing that this approach works well
for meta-analyses that include studies with multiple correlated outcomes.
We will be posting a pre-print of the paper soon, and I can share it on the
listserv when it's available.

James

On Thu, Sep 26, 2019 at 3:12 AM Viechtbauer, Wolfgang (SP) <
wolfgang.viechtbauer at maastrichtuniversity.nl> wrote:

            

  
  
#
Dear Wolfgang, James

Thank you both for your considerate suggestions.

First of all, I would like to clarify that I will be sending out another thread related to Wolfgang's comment about adding study ID to random factors as it has caused some major issues with my current analysis and I would really like second feedbacks on this matter (on my very next e-mail).

Related to James's suggestion, I will follow up on your newly published paper and apply this to my code. Since I am using variance-covariance matrix instead of normal variance (to account for shared control/treatment groups) and trying to incorporate this to modified egger's test, I am wondering if means I should be creating a diagonal matrix constituted of sqrt(1 / n1 + 1 / n2) for all inter-dependent effect sizes?

Best regards,
JU
#
Dear Wolfgang,

As I mentioned in my previous e-mail, I am having some doubts about the way I have specified my random effects in models, and realize that I should seek your advice on this topic as well.

To explain my data structure and work flow first, I am analyzing the effect of 4 different types of habitat changes on marine food-web interactions. I would first run separate random effect models with each of these 4 habitat change data and  also with all data combined. I would then run mixed effect models with same specified moderators with each of 4 dataset as well as with all studies pooled. I am running the analysis with pooled data here because we want to examine the overall patterns encompassing different changes.

1.My first question would be: if I am pooling these 4 dataset (which do show variable effect patterns) and running random or mixed effect models with pooled data (full.es), should I be accounting for merging these four habitat change groups in full models? For example, by adding creating a random effect variable with these four habitat change categories (say "Habitat.change.type")? Or do you think this is unnecessary and it is still logical to run full models without accounting for merging these datasets.

So my current code  for full model combined all of 4 dataset would be  full.es<-rma.mv(hedged,VCV, method="REML", random = ~ 1 | Study/ID, data=MHF)
then can I change this to full.es<-rma.mv(hedged,VCV, method="REML", random = list( ~ 1 | Study/ID,  1 | Habitat.change.type), data=MHF)?


2. My second question directly results from your previous comments. I have been following https://stat.ethz.ch/pipermail/r-sig-meta-analysis/2018-April/000778.html where you advised to use inner/outer logic with variance structure if there is a big gap between moderator model and subgroup model (removing intercept from moderator model). I have had the same issue as in this post, and I was able to fix this issue by adding struct="DIAG" logic and specifying moderator as inner random effect.

So my  initial code looked like:
Multivariate Meta-Analysis Model (k = 778; method: REML)
logLik    Deviance         AIC         BIC        AICc
-1533.1822   3066.3645   3074.3645   3092.9811   3074.4163
Variance Components:
  outer factor: Study                           (nlvls = 166)
inner factor: Consumer.habitat.specialization (nlvls = 2)
estim    sqrt  k.lvl  fixed       level
tau^2.1    0.8936  0.9453    615     no  Generalist
tau^2.2    0.9615  0.9806    163     no  Specialist
Test for Residual Heterogeneity:
  QE(df = 776) = 3497.6681, p-val < .0001
Test of Moderators (coefficient(s) 2):
  QM(df = 1) = 10.7143, p-val = 0.0011

Model Results:
  estimate      se     zval    pval    ci.lb    ci.ub
intrcpt                                              0.7276  0.0871   8.3549  <.0001   0.5569   0.8982  ***
  factor(Consumer.habitat.specialization)Specialist   -0.5991  0.1830  -3.2733  0.0011  -0.9578  -0.2404   **
  ---
  Signif. codes:  0 ?***? 0.001 ?**? 0.01 ?*? 0.05 ?.? 0.1 ? ? 1



But, then I add study Id to the model as you've suggested then moderator effects changes completely. This has happend to every single mixed effect models, all of them that
generated highly significant QM when only inner (moderator) and outer (Study) random effects were specified.
Multivariate Meta-Analysis Model (k = 778; method: REML)
logLik    Deviance         AIC         BIC        AICc
-1280.5447   2561.0894   2571.0894   2594.3601   2571.1673
Variance Components:
  estim    sqrt  nlvls  fixed                                    factor
sigma^2.1  0.5299  0.7279      2     no           Consumer.habitat.specialization
sigma^2.2  0.8052  0.8973    183     no     Consumer.habitat.specialization/Study
sigma^2.3  0.4899  0.6999    778     no  Consumer.habitat.specialization/Study/Id
Test for Residual Heterogeneity:
  QE(df = 776) = 3497.6681, p-val < .0001
Test of Moderators (coefficient(s) 2):
  QM(df = 1) = 0.5276, p-val = 0.4676
Model Results:
  estimate      se     zval    pval    ci.lb   ci.ub
intrcpt                                              0.8221  0.7336   1.1206  0.2625  -0.6158  2.2599
factor(Consumer.habitat.specialization)Specialist   -0.7598  1.0460  -0.7263  0.4676  -2.8099  1.2904
---
  Signif. codes:  0 ?***? 0.001 ?**? 0.01 ?*? 0.05 ?.? 0.1 ? ? 1


But if I remove this inner logic (moderator) from random effects, results are back to being highly significant. So adding this inner logic seem to really change our results.
Multivariate Meta-Analysis Model (k = 778; method: REML)
    logLik    Deviance         AIC         BIC        AICc
-1283.6031   2567.2063   2575.2063   2593.8229   2575.2582
Variance Components:

            estim    sqrt  nlvls  fixed    factor
sigma^2.1  0.7648  0.8745    166     no     Study
sigma^2.2  0.5117  0.7153    778     no  Study/Id
Test for Residual Heterogeneity:
QE(df = 776) = 3497.6681, p-val < .0001
Test of Moderators (coefficient(s) 2):
QM(df = 1) = 39.0995, p-val < .0001
Model Results:
                                                   estimate      se     zval    pval    ci.lb    ci.ub
intrcpt                                              0.8429  0.0868   9.7165  <.0001   0.6729   1.0130  ***
factor(Consumer.habitat.specialization)Specialist   -0.8873  0.1419  -6.2530  <.0001  -1.1654  -0.6092  ***
---
Signif. codes:  0 ?***? 0.001 ?**? 0.01 ?*? 0.05 ?.? 0.1 ? ? 1



Same thing happens for my subgroup models. Adding ID to current inner/outer model completely nullify what was previously super strong effects. However, if I remove inner logic and just use "random = ~ 1 | Study/Id" results become somewhat more similar although magnitude of effect size actually increased.


# Original: random = ~ Consumer.habitat.specialization | Study
Multivariate Meta-Analysis Model (k = 778; method: REML
    logLik    Deviance         AIC         BIC        AICc
-1533.1822   3066.3645   3074.3645   3092.9811   3074.4163
Variance Components:
outer factor: Study                           (nlvls = 166)
inner factor: Consumer.habitat.specialization (nlvls = 2)

            estim    sqrt  k.lvl  fixed       level
tau^2.1    0.8936  0.9453    615     no  Generalist
tau^2.2    0.9615  0.9806    163     no  Specialist
Test for Residual Heterogeneity:
QE(df = 776) = 3497.6681, p-val < .0001
Test of Moderators (coefficient(s) 1:2):
QM(df = 2) = 70.4413, p-val < .0001
Model Results:
                                                   estimate      se    zval    pval    ci.lb   ci.ub
factor(Consumer.habitat.specialization)Generalist    0.7276  0.0871  8.3549  <.0001   0.5569  0.8982  ***
factor(Consumer.habitat.specialization)Specialist    0.1285  0.1610  0.7981  0.4248  -0.1870  0.4440
---
Signif. codes:  0 ?***? 0.001 ?**? 0.01 ?*? 0.05 ?.? 0.1 ? ? 1


# Add ID to original: random = ~ Consumer.habitat.specialization | Study/Id
Multivariate Meta-Analysis Model (k = 778; method: REML)
    logLik    Deviance         AIC         BIC        AICc
-1280.5447   2561.0894   2571.0894   2594.3601   2571.1673
Variance Components:
            estim    sqrt  nlvls  fixed                                    factor
sigma^2.1  0.5299  0.7279      2     no           Consumer.habitat.specialization
sigma^2.2  0.8052  0.8973    183     no     Consumer.habitat.specialization/Study
sigma^2.3  0.4899  0.6999    778     no  Consumer.habitat.specialization/Study/Id
Test for Residual Heterogeneity:
QE(df = 776) = 3497.6681, p-val < .0001
Test of Moderators (coefficient(s) 1:2):
QM(df = 2) = 1.2627, p-val = 0.5319
Model Results:
                                                   estimate      se    zval    pval    ci.lb   ci.ub
factor(Consumer.habitat.specialization)Generalist    0.8221  0.7336  1.1206  0.2625  -0.6158  2.2599
factor(Consumer.habitat.specialization)Specialist    0.0623  0.7456  0.0835  0.9334  -1.3991  1.5237
---
Signif. codes:  0 ?***? 0.001 ?**? 0.01 ?*? 0.05 ?.? 0.1 ? ? 1


# Add ID but remove inner logic and variance structure specification: random = ~ 1 | Study/ID
Multivariate Meta-Analysis Model (k = 778; method: REML)
    log Lik    Deviance         AIC         BIC        AICc
-1283.6031   2567.2063   2575.2063   2593.8229   2575.2582
Variance Components:
            estim    sqrt  nlvls  fixed    factor
sigma^2.1  0.7648  0.8745    166     no     Study
sigma^2.2  0.5117  0.7153    778     no  Study/Id
Test for Residual Heterogeneity:
QE(df = 776) = 3497.6681, p-val < .0001
Test of Moderators (coefficient(s) 1:2):
QM(df = 2) = 103.0703, p-val < .0001
Model Results:
                                                   estimate      se     zval    pval    ci.lb   ci.ub
factor(Consumer.habitat.specialization)Generalist    0.8429  0.0868   9.7165  <.0001   0.6729  1.0130  ***
factor(Consumer.habitat.specialization)Specialist   -0.0444  0.1370  -0.3237  0.7461  -0.3129  0.2242
---
Signif. codes:  0 ?***? 0.001 ?**? 0.01 ?*? 0.05 ?.? 0.1 ? ? 1


Adding ID to my RANDOM effect models without inner/outer structure does not cause any problem, and actually makes indepdently calculated effect sizes even more significant than before.
I am now wondering what is the cause of this conflicting results and how I can address this?

Third, do you have an alternative suggestions to calculate mean effect size and 95%CIs for each subgroups other than the way I did using intercept term removal?
My issues has been unless I specify the variance components, some categorical effects (especially ones with fewer sample sizes) changed drastically (both in sign and magnitudes) that
it contradicted mean effects generated by independent random effect model. For comparative purpose in figures, should I just calculate effect sizes independently or do you think is there a way
to avoid this subgroup vs. moderator model gaps, especially considering adding inner/outer variance components, Study, and ID altogether seem to cause issues here...


I apologize for this extensive letter, but I was quite surprised with this recent development and I had to make further inquiry.

Thank you very much Wolfgang and I sincerely hope to hear back from you.

Best wishes,
JU
#
JU,

To your question about how to calculated the measure of precision: no,
there's no need to create a matrix. Just a vector with the measure of
precision, because it's the vector that will be used as a predictor in the
meta-regression model.

James
On Thu, Sep 26, 2019 at 11:05 AM Ju Lee <juhyung2 at stanford.edu> wrote:

            

  
  
#
Dear James, Wolfgang,

Thank you very much for this information.
I have one question extending from this is: While I run my main mixed modes always using var-covar. matrix (to account for shared study groups within each study),
it is acceptable that my egger-like regression does not incorporate this structure, but rather just use sqrt(1 / n1 + 1 / n2)  as precision (instead of sqrt(diag(v.c.matrix)) like Wolfgang suggested as one possibility) and use p-value for precision term (precision.2 which is p=0.2382) to determine the asymmetry?

prec.<-function(CN,TN){
  pr<-sqrt((1 / CN) + (1/TN))
  return(pr)
}
precision.2<-prec.(MHF$n.t, MHF$n.c)
Multivariate Meta-Analysis Model (k = 285; method: REML)
Variance Components:
  estim    sqrt  nlvls  fixed  factor
sigma^2.1  0.4255  0.6523     73     no   Study
sigma^2.2  0.3130  0.5595    285     no      Id
Test for Residual Heterogeneity:
  QE(df = 283) = 1041.1007, p-val < .0001
Test of Moderators (coefficient(s) 2):
  QM(df = 1) = 1.3909, p-val = 0.2382

Model Results:
  estimate      se     zval    pval    ci.lb   ci.ub
intrcpt        0.0529  0.1617   0.3274  0.7433  -0.2640  0.3699
precision.2   -0.0668  0.0567  -1.1794  0.2382  -0.1779  0.0442
---
  Signif. codes:  0 ?***? 0.001 ?**? 0.01 ?*? 0.05 ?.? 0.1 ? ? 1


Thank you very much, both of you.
Best,
JU

p.s. Wolfgang, I think I figured out what went wrong with how I specified my random effects in my previous e-mail. Specifying it as random=list(~factor(x)|Study, ~factor(x)|Id) instead of random= ~factor(x)|Study/Id generates results that makes sense to me now. Please let me know if this is correct way I should be coding.
#
You are still using 'residuals' as the outcome. Don't do that. Just use the Hedges' g values as the outcome. Also, you should specify the correct V matrix (I think you called it VCV in an earlier post). So, for example:

rma.mv(hedged ~ precision.2, VCV, data=MHF, random = list(~ 1 | Study, ~1 | Id),  subset=(Spatial.scale.2=="Fragmentation scale"))

I haven't looked at your other post in detail, but 'random = ~ factor(x) | Study/Id' doesn't actually work (at least not in the way you think it does). Please update your metafor installation to get an error that will inform you of this. Instead, random = list(~factor(x)|Study, ~factor(x)|Id) is indeed the correct way to specify two '~ inner | outer' terms.

Best,
Wolfgang

-----Original Message-----
From: Ju Lee [mailto:juhyung2 at stanford.edu] 
Sent: Friday, 27 September, 2019 16:46
To: James Pustejovsky
Cc: Viechtbauer, Wolfgang (SP); r-sig-meta-analysis at r-project.org
Subject: Re: Covariance-variance matrix when studies share multiple treatment x control comparison

Dear James,?Wolfgang,

Thank you very much for this information.
I have one question extending from this is: While I run my main mixed modes always using var-covar. matrix (to account for shared study groups within each study),?
it is acceptable that my egger-like regression does not incorporate this structure, but rather just use sqrt(1 / n1 + 1 / n2)? as precision (instead of sqrt(diag(v.c.matrix)) like?Wolfgang suggested as one possibility) and use p-value for precision term (precision.2 which is p=0.2382) to determine the asymmetry?

prec.<-function(CN,TN){
? pr<-sqrt((1 / CN) + (1/TN))
? return(pr)
}
precision.2<-prec.(MHF$n.t, MHF$n.c)
Multivariate Meta-Analysis Model (k = 285; method: REML)
Variance Components:?
? estim ? ?sqrt ?nlvls ?fixed ?factor
sigma^2.1 ?0.4255 ?0.6523 ? ? 73 ? ? no ? Study
sigma^2.2 ?0.3130 ?0.5595 ? ?285 ? ? no ? ? ?Id
Test for Residual Heterogeneity: 
? QE(df = 283) = 1041.1007, p-val < .0001
Test of Moderators (coefficient(s) 2): 
? QM(df = 1) = 1.3909, p-val = 0.2382

Model Results:
? estimate ? ? ?se ? ? zval ? ?pval ? ?ci.lb ? ci.ub ? 
intrcpt ? ? ? ?0.0529 ?0.1617 ? 0.3274 ?0.7433 ?-0.2640 ?0.3699 ? 
precision.2 ? -0.0668 ?0.0567 ?-1.1794 ?0.2382 ?-0.1779 ?0.0442? ?
---
? Signif. codes: ?0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 


Thank you very much, both of you.
Best,
JU

p.s.?Wolfgang, I think I figured out what went wrong with how I specified my random effects in my previous e-mail. Specifying it as?random=list(~factor(x)|Study, ~factor(x)|Id)?instead of random= ~factor(x)|Study/Id generates results that makes sense to me now. Please let me know if this is correct way I should be coding.
#
Dear Wolfgang,

Thank you for your patience. It definitely slipped out, and now I've run the model with hedged (not residual), modified precision following James' new work, and specified correct variance-covariance matrix. Also thank you for clarifying how to specify random effect correctly. I believe I will upgrade the version as you have suggested.

Please don't feel you need to respond to my other post as this was the major issue.
The other issue I brought up in that e-mail was more general question and if I may I will explain here:

My data consists of 4 major groups that each examines effect of different habitat variable on the same response variable. I am basically running all random or mixed effect models separately for each of these four data clumps as I want to understand how each type of change influences. However, I'm also running an analysis pooling all studies to understand the overall pattern and moderator effects. For this purpose, I wonder if I should parametrize these different 4 groups in pooled model, say in the form of random effect? I am not sure if this makes sense, or am I just overparameterizing....

I wonder do I need to make changes from  full.chs.es<-rma.mv(hedged~chs,VCV, method="REML", random = list(~ chs| Study, ~chs|ID), data=MHF)
to something like:  full.chs.es<-rma.mv(hedged~chs,VCV, method="REML", random = list( ~ chs | Study, ~chs |ID,  ~chs | Habitat.change.type), data=MHF)?

Thank you very much, Wolfgang.
Hopefully this will take care of most issues that I've been having so far!

Best wishes,
JU