[R-meta] Questions about model averaging with complex multilevel meta-analytic model - R-SIG-meta-analysis

Sat, Sep 24, 2022 9:58 AM #

Hi Wolfgang, 

I am PhD student at UBC in Vancouver, Canada, currently working on a meta-analysis. I have been trying to do model selection and model averaging using the model.sel() and model.avg() functions from the MuMIn package with an rma.mv model object, while also following your metafor help page using MuMIn and glmulti. I have been unable to get any of my models to perform model selection or model averaging because each model is being fit to different data. I have ensured there are no missing values or NAs across the data frame. 

Is it possible to do model averaging with the rma.mv function with 3 level random effects, a phylogenetic correlation, and several moderators? There are currently no examples I could find using model selection or averaging with this model structure and I have had no luck on stack overflow. 

Here is the full model I am trying to run:

full_mod<-rma.mv(yi=yi, V=V, 
                  mods = ~ I(flux_range-mean(flux_range))*I(mean_temp_constant-mean(mean_temp_constant))
                  +I(flux_range-mean(flux_range))*experiment_type
                  +I(mean_temp_constant-mean(mean_temp_constant))*experiment_type
                  +I(secondary_temp - mean(secondary_temp))*experiment_type
                  +duration_standard*experiment_type 
                  +experiment_type*experiment_type
                  +exp_age*experiment_type
                  +size*experiment_type
                  +ecosystem*experiment_type
                  +trait_directionality*experiment_type
                  +sqrt(vi)*experiment_type, dfs="contain",
                  random = list( ~1 | phylo, ~1 | study_id, ~1 | response_id),
                  R = list(phylo=cor), test="t",
                  method="REML", data=dat_ES_final_2)


I have also tried using dredge() in MuMIn in addition to trying my own subset of models with no luck but it takes days for dredge() to iterate and still yields an error when trying to perform model averaging that none of the models are being fit to the same data. Any suggestions or assistance you can provide on this would be greatly appreciated. 

Cheers, 
Maggie 

<*)))><>  <*)))><>   <*)))><>  <*)))><>  

Maggie Slein (she/her/hers)
PhD Student, O?Connor Lab
Department of Zoology
Unceded x?m??k??y??m (Musqueam) territory
University of British Columbia

Wolfgang Viechtbauer

Sat, Sep 24, 2022 11:41 AM #

Dear Maggie,

Some notes on this:

1) dredge() (and I assume the other functions from MuMIn as well) examines the 'nobs' attribute that logLik() returns to determine the sample size of the various models. However, when using REML estimation, nobs = k - p, where p is the number of model coefficients (for some technical reasons that are not important right now). However, this leads dredge() to think that the sample size differs across models where p differs.

In general, you should use method="ML" when comparing models that differ in terms of their fixed effects.[1] In that case, nobs = k and this issue won't arise.

2) I would recommend to do all transformations (like mean centering or things like sqrt(vi)) outside of the model call (so, beforehand).

3) You have *a lot* of fixed effects and even interactions. This will lead to many models that dredge() needs to fit. This could take a looooong time. dredge() has a 'cluster' argument for doing parallel processing, which you might consider using if you have powerful enough hardware. Still, even then this could be a rather daunting task.

4) I can confirm that dredge() works just fine with rma.mv() models. An example with a similar model as you are fitting can be found here:

https://gist.github.com/wviechtb/891483eea79da21d057e60fd1e28856b

Best,
Wolfgang

[1] Actually, based on some research we did, REML might actually work: 

Cinar, O., Umbanhowar, J., Hoeksema, J. D., & Viechtbauer, W. (2021). Using information-theoretic approaches for model selection in meta-analysis. Research Synthesis Methods, 12(4), 537?556. https://doi.org/10.1002/jrsm.1489

But we didn't examine complex models like you are using and I would still be very cautious with using REML when doing so.

-----Original Message-----
From: R-sig-meta-analysis [mailto:r-sig-meta-analysis-bounces at r-project.org] On
Behalf Of Margaret Slein
Sent: Saturday, 24 September, 2022 18:58
To: r-sig-meta-analysis at r-project.org
Subject: [R-meta] Questions about model averaging with complex multilevel meta-
analytic model

Hi Wolfgang,

I am PhD student at UBC in Vancouver, Canada, currently working on a meta-
analysis. I have been trying to do model selection and model averaging using the
model.sel() and model.avg() functions from the MuMIn package with an rma.mv model
object, while also following your metafor help page using MuMIn and glmulti. I
have been unable to get any of my models to perform model selection or model
averaging because each model is being fit to different data. I have ensured there
are no missing values or NAs across the data frame.

Is it possible to do model averaging with the rma.mv function with 3 level random
effects, a phylogenetic correlation, and several moderators? There are currently
no examples I could find using model selection or averaging with this model
structure and I have had no luck on stack overflow.

Here is the full model I am trying to run:

full_mod<-rma.mv(yi=yi, V=V,
                 mods = ~ I(flux_range-mean(flux_range))*I(mean_temp_constant-
mean(mean_temp_constant))
                 +I(flux_range-mean(flux_range))*experiment_type
                 +I(mean_temp_constant-mean(mean_temp_constant))*experiment_type
                 +I(secondary_temp - mean(secondary_temp))*experiment_type
                 +duration_standard*experiment_type
                 +experiment_type*experiment_type
                 +exp_age*experiment_type
                 +size*experiment_type
                 +ecosystem*experiment_type
                 +trait_directionality*experiment_type
                 +sqrt(vi)*experiment_type, dfs="contain",
                 random = list( ~1 | phylo, ~1 | study_id, ~1 | response_id),
                 R = list(phylo=cor), test="t",
                 method="REML", data=dat_ES_final_2)

I have also tried using dredge() in MuMIn in addition to trying my own subset of
models with no luck but it takes days for dredge() to iterate and still yields an
error when trying to perform model averaging that none of the models are being
fit to the same data. Any suggestions or assistance you can provide on this would
be greatly appreciated.

Cheers,
Maggie

<*)))><>  <*)))><>   <*)))><>  <*)))><>

Maggie Slein (she/her/hers)
PhD Student, O?Connor Lab
Department of Zoology
Unceded x?m??k??y??m (Musqueam) territory
University of British Columbia

Margaret Slein

Sat, Sep 24, 2022 2:43 PM #

Hi Wolfgang, 
Thank you, this has been most helpful. Another lingering question I had was in regards to quantifying publication bias. I included the square root of the sampling variance to assess this and in the best model, it has a very large statistically significant effect. Is this approach, of including the square root of the sampling variance in the global model with all other moderators, appropriate and if so, how would I account for this publication bias in the rest of my results?

I cannot conduct a trim and fill analysis with the function in metafor because of the model's complex structure, but if there are other ways of trying to address potential strong bias, I am interested in doing so. 

Cheers, 
Maggie




<*)))><>  <*)))><>   <*)))><>  <*)))><>  

Maggie Slein (she/her/hers)
PhD Student, O?Connor Lab
Department of Zoology
Unceded x?m??k??y??m (Musqueam) territory
University of British Columbia

On Sep 24, 2022, at 11:41 AM, Viechtbauer, Wolfgang (NP) <wolfgang.viechtbauer at maastrichtuniversity.nl> wrote:

Dear Maggie,

Some notes on this:

1) dredge() (and I assume the other functions from MuMIn as well) examines the 'nobs' attribute that logLik() returns to determine the sample size of the various models. However, when using REML estimation, nobs = k - p, where p is the number of model coefficients (for some technical reasons that are not important right now). However, this leads dredge() to think that the sample size differs across models where p differs.

In general, you should use method="ML" when comparing models that differ in terms of their fixed effects.[1] In that case, nobs = k and this issue won't arise.

2) I would recommend to do all transformations (like mean centering or things like sqrt(vi)) outside of the model call (so, beforehand).

3) You have *a lot* of fixed effects and even interactions. This will lead to many models that dredge() needs to fit. This could take a looooong time. dredge() has a 'cluster' argument for doing parallel processing, which you might consider using if you have powerful enough hardware. Still, even then this could be a rather daunting task.

4) I can confirm that dredge() works just fine with rma.mv() models. An example with a similar model as you are fitting can be found here:

https://gist.github.com/wviechtb/891483eea79da21d057e60fd1e28856b

Best,
Wolfgang

[1] Actually, based on some research we did, REML might actually work: 

Cinar, O., Umbanhowar, J., Hoeksema, J. D., & Viechtbauer, W. (2021). Using information-theoretic approaches for model selection in meta-analysis. Research Synthesis Methods, 12(4), 537?556. https://doi.org/10.1002/jrsm.1489

But we didn't examine complex models like you are using and I would still be very cautious with using REML when doing so.

-----Original Message-----
From: R-sig-meta-analysis [mailto:r-sig-meta-analysis-bounces at r-project.org] On
Behalf Of Margaret Slein
Sent: Saturday, 24 September, 2022 18:58
To: r-sig-meta-analysis at r-project.org
Subject: [R-meta] Questions about model averaging with complex multilevel meta-
analytic model

Hi Wolfgang,

I am PhD student at UBC in Vancouver, Canada, currently working on a meta-
analysis. I have been trying to do model selection and model averaging using the
model.sel() and model.avg() functions from the MuMIn package with an rma.mv model
object, while also following your metafor help page using MuMIn and glmulti. I
have been unable to get any of my models to perform model selection or model
averaging because each model is being fit to different data. I have ensured there
are no missing values or NAs across the data frame.

Is it possible to do model averaging with the rma.mv function with 3 level random
effects, a phylogenetic correlation, and several moderators? There are currently
no examples I could find using model selection or averaging with this model
structure and I have had no luck on stack overflow.

Here is the full model I am trying to run:

full_mod<-rma.mv(yi=yi, V=V,
                mods = ~ I(flux_range-mean(flux_range))*I(mean_temp_constant-
mean(mean_temp_constant))
                +I(flux_range-mean(flux_range))*experiment_type
                +I(mean_temp_constant-mean(mean_temp_constant))*experiment_type
                +I(secondary_temp - mean(secondary_temp))*experiment_type
                +duration_standard*experiment_type
                +experiment_type*experiment_type
                +exp_age*experiment_type
                +size*experiment_type
                +ecosystem*experiment_type
                +trait_directionality*experiment_type
                +sqrt(vi)*experiment_type, dfs="contain",
                random = list( ~1 | phylo, ~1 | study_id, ~1 | response_id),
                R = list(phylo=cor), test="t",
                method="REML", data=dat_ES_final_2)

I have also tried using dredge() in MuMIn in addition to trying my own subset of
models with no luck but it takes days for dredge() to iterate and still yields an
error when trying to perform model averaging that none of the models are being
fit to the same data. Any suggestions or assistance you can provide on this would
be greatly appreciated.

Cheers,
Maggie

<*)))><>  <*)))><>   <*)))><>  <*)))><>

Maggie Slein (she/her/hers)
PhD Student, O?Connor Lab
Department of Zoology
Unceded x?m??k??y??m (Musqueam) territory
University of British Columbia

Wolfgang Viechtbauer

Sun, Sep 25, 2022 3:29 AM #

One thing I meant to add to my previous mail: I hope you have a large dataset. If I didn't miscount, you have around 22+ model terms (counting main effects and interactions), which is a quite sizable model.

As for your new question:

Using some measure of precision (such as sqrt(vi)) as a predictor doesn't really quantify publication bias. It only provides an indication whether there is a relationship between the measure of precision and the effect sizes (possibly after accounting for the influence of other moderators included in the model). That's all. If there is such a relationship, it *may* suggest the presence of publication bias, but such a relationship can exist for all kinds of other reasons. Also, this method doesn't really account for publication bias.

The issue of publication bias (especially in the context of more complex models) has come up several times on this mailing list. See for example this thread:

https://stat.ethz.ch/pipermail/r-sig-meta-analysis/2022-September/004191.html

You might want to search the archives for some further discussions. You can search the archives here:

https://cse.google.com/cse?cx=ee4b2e6c93b6a9667

However, note that recent posts may not have been indexed by the search engine yet and therefore will not show up.

Best,
Wolfgang

-----Original Message-----
From: Margaret Slein [mailto:maslein at zoology.ubc.ca]
Sent: Saturday, 24 September, 2022 23:43
To: Viechtbauer, Wolfgang (NP)
Cc: r-sig-meta-analysis at r-project.org
Subject: Re: [R-meta] Questions about model averaging with complex multilevel
meta-analytic model

Hi Wolfgang,

Thank you, this has been most helpful. Another lingering question I had was in
regards to quantifying publication bias. I included the square root of the
sampling variance to assess this and in the best model, it has a very large
statistically significant effect. Is this approach, of including the square root
of the sampling variance in the global model with all other moderators,
appropriate and if so, how would I account for this publication bias in the rest
of my results?

I cannot conduct a trim and fill analysis with the function in metafor because of
the model's complex structure, but if there are other ways of trying to address
potential strong bias, I am interested in doing so.

Cheers,
Maggie

<*)))><> ?<*)))><> ? <*)))><> ?<*)))><>

Maggie Slein (she/her/hers)
PhD Student, O?Connor Lab
Department of Zoology
Unceded x?m??k??y??m (Musqueam)?territory
University of British Columbia

On Sep 24, 2022, at 11:41 AM, Viechtbauer, Wolfgang (NP)
<wolfgang.viechtbauer at maastrichtuniversity.nl> wrote:

Dear Maggie,

Some notes on this:

1) dredge() (and I assume the other functions from MuMIn as well) examines the
'nobs' attribute that logLik() returns to determine the sample size of the
various models. However, when using REML estimation, nobs = k - p, where p is the
number of model coefficients (for some technical reasons that are not important
right now). However, this leads dredge() to think that the sample size differs
across models where p differs.

In general, you should use method="ML" when comparing models that differ in terms
of their fixed effects.[1] In that case, nobs = k and this issue won't arise.

2) I would recommend to do all transformations (like mean centering or things
like sqrt(vi)) outside of the model call (so, beforehand).

3) You have *a lot* of fixed effects and even interactions. This will lead to
many models that dredge() needs to fit. This could take a looooong time. dredge()
has a 'cluster' argument for doing parallel processing, which you might consider
using if you have powerful enough hardware. Still, even then this could be a
rather daunting task.

4) I can confirm that dredge() works just fine with rma.mv() models. An example
with a similar model as you are fitting can be found here:

https://gist.github.com/wviechtb/891483eea79da21d057e60fd1e28856b

Best,
Wolfgang

[1] Actually, based on some research we did, REML might actually work:

Cinar, O., Umbanhowar, J., Hoeksema, J. D., & Viechtbauer, W. (2021). Using
information-theoretic approaches for model selection in meta-analysis. Research
Synthesis Methods, 12(4), 537?556. https://doi.org/10.1002/jrsm.1489

But we didn't examine complex models like you are using and I would still be very
cautious with using REML when doing so.

-----Original Message-----
From: R-sig-meta-analysis [mailto:r-sig-meta-analysis-bounces at r-project.org] On
Behalf Of Margaret Slein
Sent: Saturday, 24 September, 2022 18:58
To: r-sig-meta-analysis at r-project.org
Subject: [R-meta] Questions about model averaging with complex multilevel meta-
analytic model

Hi Wolfgang,

I am PhD student at UBC in Vancouver, Canada, currently working on a meta-
analysis. I have been trying to do model selection and model averaging using the
model.sel() and model.avg() functions from the MuMIn package with an rma.mv model
object, while also following your metafor help page using MuMIn and glmulti. I
have been unable to get any of my models to perform model selection or model
averaging because each model is being fit to different data. I have ensured there
are no missing values or NAs across the data frame.

Is it possible to do model averaging with the rma.mv function with 3 level random
effects, a phylogenetic correlation, and several moderators? There are currently
no examples I could find using model selection or averaging with this model
structure and I have had no luck on stack overflow.

Here is the full model I am trying to run:

full_mod<-rma.mv(yi=yi, V=V,
????????????????mods = ~ I(flux_range-mean(flux_range))*I(mean_temp_constant-
mean(mean_temp_constant))
????????????????+I(flux_range-mean(flux_range))*experiment_type
????????????????+I(mean_temp_constant-mean(mean_temp_constant))*experiment_type
????????????????+I(secondary_temp - mean(secondary_temp))*experiment_type
????????????????+duration_standard*experiment_type
????????????????+experiment_type*experiment_type
????????????????+exp_age*experiment_type
????????????????+size*experiment_type
????????????????+ecosystem*experiment_type
????????????????+trait_directionality*experiment_type
????????????????+sqrt(vi)*experiment_type, dfs="contain",
????????????????random = list( ~1 | phylo, ~1 | study_id, ~1 | response_id),
????????????????R = list(phylo=cor), test="t",
????????????????method="REML", data=dat_ES_final_2)

I have also tried using dredge() in MuMIn in addition to trying my own subset of
models with no luck but it takes days for dredge() to iterate and still yields an
error when trying to perform model averaging that none of the models are being
fit to the same data. Any suggestions or assistance you can provide on this would
be greatly appreciated.

Cheers,
Maggie

<*)))><> ?<*)))><> ??<*)))><> ?<*)))><>

Maggie Slein (she/her/hers)
PhD Student, O?Connor Lab
Department of Zoology
Unceded x?m??k??y??m (Musqueam) territory
University of British Columbia

Margaret Slein

Wed, Sep 28, 2022 10:56 AM #

Hi Wolfgang, 

Yes, very large dataset (1700+). Thank you for the advice on the publication bias, I will certainly take this into account. 

Related to my original question, I have now fit all my candidate models with ML rather than REML, which allows me to use model.sel() in the MuMIn package, however, I am still unable to use the model.avg() function without the error message "Error in x$coefficients[, 1L] : subscript out of bounds?. What would cause this error and how can I model average with rma.mv objects and model.avg() in the MuMIn package. 

Many thanks,
Maggie



<*)))><>  <*)))><>   <*)))><>  <*)))><>  

Maggie Slein (she/her/hers)
PhD Student, O?Connor Lab
Department of Zoology
Unceded x?m??k??y??m (Musqueam) territory
University of British Columbia

On Sep 25, 2022, at 3:29 AM, Viechtbauer, Wolfgang (NP) <wolfgang.viechtbauer at maastrichtuniversity.nl> wrote:

One thing I meant to add to my previous mail: I hope you have a large dataset. If I didn't miscount, you have around 22+ model terms (counting main effects and interactions), which is a quite sizable model.

As for your new question:

Using some measure of precision (such as sqrt(vi)) as a predictor doesn't really quantify publication bias. It only provides an indication whether there is a relationship between the measure of precision and the effect sizes (possibly after accounting for the influence of other moderators included in the model). That's all. If there is such a relationship, it *may* suggest the presence of publication bias, but such a relationship can exist for all kinds of other reasons. Also, this method doesn't really account for publication bias.

The issue of publication bias (especially in the context of more complex models) has come up several times on this mailing list. See for example this thread:

https://stat.ethz.ch/pipermail/r-sig-meta-analysis/2022-September/004191.html

You might want to search the archives for some further discussions. You can search the archives here:

https://cse.google.com/cse?cx=ee4b2e6c93b6a9667

However, note that recent posts may not have been indexed by the search engine yet and therefore will not show up.

Best,
Wolfgang

-----Original Message-----
From: Margaret Slein [mailto:maslein at zoology.ubc.ca]
Sent: Saturday, 24 September, 2022 23:43
To: Viechtbauer, Wolfgang (NP)
Cc: r-sig-meta-analysis at r-project.org
Subject: Re: [R-meta] Questions about model averaging with complex multilevel
meta-analytic model

Hi Wolfgang,

Thank you, this has been most helpful. Another lingering question I had was in
regards to quantifying publication bias. I included the square root of the
sampling variance to assess this and in the best model, it has a very large
statistically significant effect. Is this approach, of including the square root
of the sampling variance in the global model with all other moderators,
appropriate and if so, how would I account for this publication bias in the rest
of my results?

I cannot conduct a trim and fill analysis with the function in metafor because of
the model's complex structure, but if there are other ways of trying to address
potential strong bias, I am interested in doing so.

Cheers,
Maggie

<*)))><>  <*)))><>   <*)))><>  <*)))><>

Maggie Slein (she/her/hers)
PhD Student, O?Connor Lab
Department of Zoology
Unceded x?m??k??y??m (Musqueam) territory
University of British Columbia

On Sep 24, 2022, at 11:41 AM, Viechtbauer, Wolfgang (NP)
<wolfgang.viechtbauer at maastrichtuniversity.nl> wrote:

Dear Maggie,

Some notes on this:

1) dredge() (and I assume the other functions from MuMIn as well) examines the
'nobs' attribute that logLik() returns to determine the sample size of the
various models. However, when using REML estimation, nobs = k - p, where p is the
number of model coefficients (for some technical reasons that are not important
right now). However, this leads dredge() to think that the sample size differs
across models where p differs.

In general, you should use method="ML" when comparing models that differ in terms
of their fixed effects.[1] In that case, nobs = k and this issue won't arise.

2) I would recommend to do all transformations (like mean centering or things
like sqrt(vi)) outside of the model call (so, beforehand).

3) You have *a lot* of fixed effects and even interactions. This will lead to
many models that dredge() needs to fit. This could take a looooong time. dredge()
has a 'cluster' argument for doing parallel processing, which you might consider
using if you have powerful enough hardware. Still, even then this could be a
rather daunting task.

4) I can confirm that dredge() works just fine with rma.mv() models. An example
with a similar model as you are fitting can be found here:

https://gist.github.com/wviechtb/891483eea79da21d057e60fd1e28856b

Best,
Wolfgang

[1] Actually, based on some research we did, REML might actually work:

Cinar, O., Umbanhowar, J., Hoeksema, J. D., & Viechtbauer, W. (2021). Using
information-theoretic approaches for model selection in meta-analysis. Research
Synthesis Methods, 12(4), 537?556. https://doi.org/10.1002/jrsm.1489

But we didn't examine complex models like you are using and I would still be very
cautious with using REML when doing so.

-----Original Message-----
From: R-sig-meta-analysis [mailto:r-sig-meta-analysis-bounces at r-project.org] On
Behalf Of Margaret Slein
Sent: Saturday, 24 September, 2022 18:58
To: r-sig-meta-analysis at r-project.org
Subject: [R-meta] Questions about model averaging with complex multilevel meta-
analytic model

Hi Wolfgang,

I am PhD student at UBC in Vancouver, Canada, currently working on a meta-
analysis. I have been trying to do model selection and model averaging using the
model.sel() and model.avg() functions from the MuMIn package with an rma.mv model
object, while also following your metafor help page using MuMIn and glmulti. I
have been unable to get any of my models to perform model selection or model
averaging because each model is being fit to different data. I have ensured there
are no missing values or NAs across the data frame.

Is it possible to do model averaging with the rma.mv function with 3 level random
effects, a phylogenetic correlation, and several moderators? There are currently
no examples I could find using model selection or averaging with this model
structure and I have had no luck on stack overflow.

Here is the full model I am trying to run:

full_mod<-rma.mv(yi=yi, V=V,
                mods = ~ I(flux_range-mean(flux_range))*I(mean_temp_constant-
mean(mean_temp_constant))
                +I(flux_range-mean(flux_range))*experiment_type
                +I(mean_temp_constant-mean(mean_temp_constant))*experiment_type
                +I(secondary_temp - mean(secondary_temp))*experiment_type
                +duration_standard*experiment_type
                +experiment_type*experiment_type
                +exp_age*experiment_type
                +size*experiment_type
                +ecosystem*experiment_type
                +trait_directionality*experiment_type
                +sqrt(vi)*experiment_type, dfs="contain",
                random = list( ~1 | phylo, ~1 | study_id, ~1 | response_id),
                R = list(phylo=cor), test="t",
                method="REML", data=dat_ES_final_2)

I have also tried using dredge() in MuMIn in addition to trying my own subset of
models with no luck but it takes days for dredge() to iterate and still yields an
error when trying to perform model averaging that none of the models are being
fit to the same data. Any suggestions or assistance you can provide on this would
be greatly appreciated.

Cheers,
Maggie

<*)))><>  <*)))><>   <*)))><>  <*)))><>

Maggie Slein (she/her/hers)
PhD Student, O?Connor Lab
Department of Zoology
Unceded x?m??k??y??m (Musqueam) territory
University of British Columbia

Wolfgang Viechtbauer

Thu, Sep 29, 2022 2:25 AM #

Troubleshooting this remotely is difficult.

Does this example run?

https://gist.github.com/wviechtb/891483eea79da21d057e60fd1e28856b

If so, then this is a bit of a head-scratcher given that there is nothing fundamentally different about your model. It just has a lot more fixed effects (and a slightly simpler random effects structure).

Maybe start by simplifying the 'mods' part. First try with just 2 or 3 of the variables and see if that works. If the problem only arises when adding more predictors, then this could be a problem with MuMIn. But first, you have to narrow down where the problem is coming from.

Best,
Wolfgang