Skip to content

[R-meta] Questions about model averaging with complex multilevel meta-analytic model

6 messages · Margaret Slein, Wolfgang Viechtbauer

#
Hi Wolfgang, 

I am PhD student at UBC in Vancouver, Canada, currently working on a meta-analysis. I have been trying to do model selection and model averaging using the model.sel() and model.avg() functions from the MuMIn package with an rma.mv model object, while also following your metafor help page using MuMIn and glmulti. I have been unable to get any of my models to perform model selection or model averaging because each model is being fit to different data. I have ensured there are no missing values or NAs across the data frame. 

Is it possible to do model averaging with the rma.mv function with 3 level random effects, a phylogenetic correlation, and several moderators? There are currently no examples I could find using model selection or averaging with this model structure and I have had no luck on stack overflow. 

Here is the full model I am trying to run:

full_mod<-rma.mv(yi=yi, V=V, 
                  mods = ~ I(flux_range-mean(flux_range))*I(mean_temp_constant-mean(mean_temp_constant))
                  +I(flux_range-mean(flux_range))*experiment_type
                  +I(mean_temp_constant-mean(mean_temp_constant))*experiment_type
                  +I(secondary_temp - mean(secondary_temp))*experiment_type
                  +duration_standard*experiment_type 
                  +experiment_type*experiment_type
                  +exp_age*experiment_type
                  +size*experiment_type
                  +ecosystem*experiment_type
                  +trait_directionality*experiment_type
                  +sqrt(vi)*experiment_type, dfs="contain",
                  random = list( ~1 | phylo, ~1 | study_id, ~1 | response_id),
                  R = list(phylo=cor), test="t",
                  method="REML", data=dat_ES_final_2)


I have also tried using dredge() in MuMIn in addition to trying my own subset of models with no luck but it takes days for dredge() to iterate and still yields an error when trying to perform model averaging that none of the models are being fit to the same data. Any suggestions or assistance you can provide on this would be greatly appreciated. 

Cheers, 
Maggie 

<*)))><>  <*)))><>   <*)))><>  <*)))><>  

Maggie Slein (she/her/hers)
PhD Student, O?Connor Lab
Department of Zoology
Unceded x?m??k??y??m (Musqueam) territory
University of British Columbia
#
Dear Maggie,

Some notes on this:

1) dredge() (and I assume the other functions from MuMIn as well) examines the 'nobs' attribute that logLik() returns to determine the sample size of the various models. However, when using REML estimation, nobs = k - p, where p is the number of model coefficients (for some technical reasons that are not important right now). However, this leads dredge() to think that the sample size differs across models where p differs.

In general, you should use method="ML" when comparing models that differ in terms of their fixed effects.[1] In that case, nobs = k and this issue won't arise.

2) I would recommend to do all transformations (like mean centering or things like sqrt(vi)) outside of the model call (so, beforehand).

3) You have *a lot* of fixed effects and even interactions. This will lead to many models that dredge() needs to fit. This could take a looooong time. dredge() has a 'cluster' argument for doing parallel processing, which you might consider using if you have powerful enough hardware. Still, even then this could be a rather daunting task.

4) I can confirm that dredge() works just fine with rma.mv() models. An example with a similar model as you are fitting can be found here:

https://gist.github.com/wviechtb/891483eea79da21d057e60fd1e28856b

Best,
Wolfgang

[1] Actually, based on some research we did, REML might actually work: 

Cinar, O., Umbanhowar, J., Hoeksema, J. D., & Viechtbauer, W. (2021). Using information-theoretic approaches for model selection in meta-analysis. Research Synthesis Methods, 12(4), 537?556. https://doi.org/10.1002/jrsm.1489

But we didn't examine complex models like you are using and I would still be very cautious with using REML when doing so.
#
Hi Wolfgang, 
Thank you, this has been most helpful. Another lingering question I had was in regards to quantifying publication bias. I included the square root of the sampling variance to assess this and in the best model, it has a very large statistically significant effect. Is this approach, of including the square root of the sampling variance in the global model with all other moderators, appropriate and if so, how would I account for this publication bias in the rest of my results?

I cannot conduct a trim and fill analysis with the function in metafor because of the model's complex structure, but if there are other ways of trying to address potential strong bias, I am interested in doing so. 

Cheers, 
Maggie




<*)))><>  <*)))><>   <*)))><>  <*)))><>  

Maggie Slein (she/her/hers)
PhD Student, O?Connor Lab
Department of Zoology
Unceded x?m??k??y??m (Musqueam) territory
University of British Columbia

  
  
#
One thing I meant to add to my previous mail: I hope you have a large dataset. If I didn't miscount, you have around 22+ model terms (counting main effects and interactions), which is a quite sizable model.

As for your new question:

Using some measure of precision (such as sqrt(vi)) as a predictor doesn't really quantify publication bias. It only provides an indication whether there is a relationship between the measure of precision and the effect sizes (possibly after accounting for the influence of other moderators included in the model). That's all. If there is such a relationship, it *may* suggest the presence of publication bias, but such a relationship can exist for all kinds of other reasons. Also, this method doesn't really account for publication bias.

The issue of publication bias (especially in the context of more complex models) has come up several times on this mailing list. See for example this thread:

https://stat.ethz.ch/pipermail/r-sig-meta-analysis/2022-September/004191.html

You might want to search the archives for some further discussions. You can search the archives here:

https://cse.google.com/cse?cx=ee4b2e6c93b6a9667

However, note that recent posts may not have been indexed by the search engine yet and therefore will not show up.

Best,
Wolfgang
3 days later
#
Hi Wolfgang, 

Yes, very large dataset (1700+). Thank you for the advice on the publication bias, I will certainly take this into account. 

Related to my original question, I have now fit all my candidate models with ML rather than REML, which allows me to use model.sel() in the MuMIn package, however, I am still unable to use the model.avg() function without the error message "Error in x$coefficients[, 1L] : subscript out of bounds?. What would cause this error and how can I model average with rma.mv objects and model.avg() in the MuMIn package. 

Many thanks,
Maggie



<*)))><>  <*)))><>   <*)))><>  <*)))><>  

Maggie Slein (she/her/hers)
PhD Student, O?Connor Lab
Department of Zoology
Unceded x?m??k??y??m (Musqueam) territory
University of British Columbia

  
  
#
Troubleshooting this remotely is difficult.

Does this example run?

https://gist.github.com/wviechtb/891483eea79da21d057e60fd1e28856b

If so, then this is a bit of a head-scratcher given that there is nothing fundamentally different about your model. It just has a lot more fixed effects (and a slightly simpler random effects structure).

Maybe start by simplifying the 'mods' part. First try with just 2 or 3 of the variables and see if that works. If the problem only arises when adding more predictors, then this could be a problem with MuMIn. But first, you have to narrow down where the problem is coming from.

Best,
Wolfgang