Skip to content

[R-meta] phylogenetic information in both moderator and random part of rma.mv?

5 messages · Wolfgang Viechtbauer, Sigurd Einum

#
Dear list,
I want to test for evolutionary divergence (among ectothermic animals) in a single trait Y. My first approach to this was to test for divergence among taxonomic classes (specifically amphibians, reptiles, insects, fish, and crustaceans). I compiled data on several species per class, and multiple estimates of Y per species (from different experiments), and analysed the data using a traditional mixed effects model (lme), with species as a random effect and taxonomic class as fixed effect.

However, one reviewer suggested I should control for phylogeny (without being more specific). So I built a tree for these species (using package rotl), made it ultrametric (using compute.brlen in package ape), and computed the variance-covariance matrix A from this (using vcv from package ape). I then created a variable phylo to distinguish the phylogenetic component from the non-phlylogenetic species random effect, and used metafor to fit the model:

rma.mv(Y, sampvar,mods = ~ Class, random = list(~ 1 | species, ~ 1 | phylo),
               R = list(phylo = A), data = Final.data, sparse = TRUE, method = "ML")

(sampvar is the sampling variance associated with each estimate of Y)

However, now I have started doubting whether this model makes sense, i.e. to estimate an effect of taxonomic class (which in essence is a phylogenetic effect) while simultaneously modelling a random effect of phylogeny. Would an appropriate alternative be to not have class as a moderator, but rather compare fits of models with and without the phylogenetic variance-covariance matrix. If the model including phylogeny is better than one without it, can I conclude that there is evolutionary divergence in this trait?

Any advice anyone might have on this would be much appreciated!

Best wishes
Sigurd Einum
#
Dear Sigurd,

I do not know enough about the specifics of the application to say whether comparing the model with versus without phylogeny is sufficient to conclude something about evolutionary divergence.

However, let me make a few comments (I am also basing some comments here on what you wrote to me initially in an email before redirecting you to this mailing list for further discussions):

So you fitted a mixed-effects model with lme() of the form:

lme(Y ~ Class, random = ~ 1 | species, data=Final.data, method="ML")

(or maybe including weights = varFixed(~ sampvar)) and found 'Class' to be relevant (regardless of whether this means: large coefficient, statistically significant, sufficiently larger value of the information criterion compared to a null model). Then you fitted the model below (where you are accounting for phylogeny) and now the support for the relevance of 'Class' disappears or is considerably weaker. I hope this summarizes the issue.

First, I would ask: Have you compared the lme() model with this?

rma.mv(Y, sampvar, mods = ~ Class, random = ~ 1 | species, data = Final.data, sparse = TRUE, method = "ML")

Note that this is not exactly the same model as what lme() fits as explained here:

https://www.metafor-project.org/doku.php/tips:rma_vs_lm_lme_lmer

This aside, it is quite possible that the relevance / effect / significance of 'Class' changes when accounting for the phylogeny, because this accounts for the potential dependence of the outcome due to similarities between species in a different way than just including species itself as a random effect.

Now does it make sense to include Class as a moderator while also including random effects for species? I would say yes. Class is a broader category, so the species random effect accounts for heterogeneity within classes (and the fixed effect for class allows the average of all species belonging to the same class to differ from that of other classes). And whether the values of this random effect are correlated or not (as predicted by the phylogeny) is an empirical question. So if the model with phylogeny fits better, then I would go with that.

Best,
Wolfgang
#
Dear Wolfgang, thank you for these insights! This clarified for me the way effects of class and phylogeny (within class) is partitioned in the model! 
When I fit the two models you suggested (see below), AICc is lower for mod2 (with delta AICc > 2). My interpretation of your reply, i.e. that the effect of phylogeny is an empirical question, is that since there is little evidence for a phylogenetic signal within classes for this data set, mod2 (i.e. treating the species as independent observations as long as I account for the random species effect (1|species)) is appropriate when estimating the class effect. 

mod1 <- rma.mv(lambda, sampvar,mods = ~ Class, random = list(~ 1 | study, ~ 1 | species, ~ 1 | phylo),
               R = list(phylo = A), data = Final.data, sparse = TRUE, method = "REML")

mod2 <- rma.mv(lambda, sampvar,mods = ~ Class, random = list(~ 1 | study, ~ 1 | species),
                                   data = Final.data, sparse = TRUE, method = "REML")

Best wishes,
Sigurd
#
Sounds reasonable.

One additional thing: I don't know what your data structure actually looks like. Is each row a different study? Or is each row a different species? If not, then I would strongly suggest to include '~ 1 | id' where

Final.data$id <- 1:nrow(Final.data)

as a random effect (in both mod1 and mod2). Otherwise you would be assuming homogeneity of true outcomes within studies/species, which is a strong assumption.

Best,
Wolfgang
#
There are (usually) several rows (observations of Y and associated sampling variance) for each species, as well as for each study. Hence the random structure   ~ 1|study, ~ 1|species. 

I tried adding the id variable and the 1|id random structure as you recommended (model estimates of class parameters remain basically unaltered, whereas AICc drops a lot), but I don't really understand what this achieves, and I have never used or encountered the use of such a structure with each observed data point being assigned a separate random effect in mixed models (lme/lmer) previously. Is it somehow related to the weight= varIdent in lme, which allows for different random effects for different groups of observations? But in this case the model allows for different random effects for each single observation? It is probably beyond me to understand this (I'm not a statistician), which is fine, but if possible I would like to try, so if you can point me to a resource where I could read up on this that would be great. I read through this https://wviechtb.github.io/metafor/reference/rma.mv.html but still not sure I got it. 

Thanks again for all your helpful advice!
Sigurd