quantifying directed dependence of environmental factors
Dear Gavin,
On Mon, Mar 11, 2013 at 1:58 PM, Gavin Simpson <gavin.simpson at ucl.ac.uk> wrote:
Hi Jay, What you describe is similar to research conducted at UCL by myself and colleagues and also as part of an EU project that finished a few years ago now. Since then, my thoughts on this have expanded a little and Sarah's point about path analysis was where I would have gone next if I was continuing this line of investigation. In the work I did at UCL, I went with a time series approach. I decomposed the species data into a set of ordination axes (I used PCA, but on Hellinger transformed data to account for non-linear responses in the species which PCA doesn't handle well). Then I fitted an additive model with, say, PCA axis 1 (PC1) as the response variable and one or more covariates entering as smooth functions as the predictors. The nice things about the additive model is that the terms come together additively and the non-linear effects allow the effect of a variable to change (i.e. cold temperatures only induce an "effect" on the response). As these were time series data I controlled for autocorrelation by using a continuous time AR(1) process for the residuals. Here are some refs on these approaches: http://www.aslo.org/lo/toc/vol_54/issue_6_part_2/2529.html http://dx.doi.org/10.1111/j.1365-2427.2012.02860.x http://dx.doi.org/10.1111/j.1365-2427.2011.02651.x http://dx.doi.org/10.1111/j.1365-2427.2011.02670.x and there is more in the Special Issue section of FWBiol: http://onlinelibrary.wiley.com/doi/10.1111/fwb.2012.57.issue-10/issuetoc Now this doesn't look at cascades of effects; one key aspect of all of the above was the construction of appropriate time series for the environmental data. Ideally, I'd take the variable that is most closely related to diatom physiology as my predictor. However those variables do not always exist or are not available. Instead surrogates can be used; amount of agricultural land-use in catchments or fertilizer-use historical records will be highly correlated with nutrient loading from a catchment to the lake/reservoir, so these could be used instead. Of course, you do need to wary of spurious correlations with time series data. As these models are using smooth functions, you are only going to be able to include one or a few covariates unless you have *lots* of samples; and anyway, I would always advise to think first from the biological or ecological viewpoint and formulate an hypothesis there and then fit that with the stats rather than throwing lots of variables into an analysis to see what pops out (which seems to be what a lot of palaeoecologists do!) As regards envfit(), it isn't symmetric in the variables at least as far as I see it; it fits a model of varZ = \beta_1 axis_1 + \beta_2 axis_2 + \varepsilon in other words it uses the 2d "axis" scores (PC1 and PC2, or nMDS1 + nMDS2 coordinates) to predict the values of the response using a linear model. As each environmental variable is modelled separately (individually), one is not favouring a set of variables etc. Perhaps this is not what you meant but worth pointing out.
Yes, you are right, that is not what I meant, and you've said it better than I did (or knew how to).
Also, envfit presumes a linear relationship between the variables and the ordination coordinates. If that is too strong an assumption, see ordisurf() which fits GAM-based surfaces rather than linear-regression surfaces.
Yes, there is evidence of nonlinearity in our data and we've done work with ordisurf, too.
A fairly standard way of looking at this sort of data might be to group variables that are related and then decompose the variance in the data in that which can be explained by each group of variables uniquely, that which can be explained by two or more groups, and the unexplained variance. The vegan package has a function varpart() which can do this all for you if you are willing to use RDA to analyse the data (unbiased estimates of the variance explained are not available for CCA and nMDS is not a constrained technique) - note you can use principal coordinates analysis to embed your original dissimilarity matrix into a metric space and then take the PCoA axis scores as the input "data" for the RDA so that the RDA is in the dissimilarity data of your choice and not linear in the original data. Steve Juggins has adapted the hierarchical partitioning approach from package hier.part to the multivariate multiple regression setting of RDA (possibly CCA too?) which is related to but somewhat different to the variance partitioning described above. I don't believe Steve has released this code yet, so if interested I'd emailing him for it; he is the author of the rioja package so contact details can be found on CRAN. Neither variance partitioning or hierarchical partitioning directly do exactly what you ask and model the directed dependence or pathways of effects. They are however far simpler methods which would have familiarity within the applied community that will see these results/papers etc. In writing this I have pondered on whether you and/or the ecologists are making it too complex? As you have all the variables of interest, I might model the variables that physiologically affect the diatoms and their effect on diatom composition (using constrained ordination or the additive model time series approach I used). Then I would model the relationship between the higher-level factor (land-use, etc) that I hypothesise to be driving changes in the lower-level, physiologically-relevant, variable.
That's interesting - I'll ask him about that.
Whilst path analysis might be ideal tool here, I suspect you'll have plenty of complications to address, not least the compositional aspects (though Sarah points you to some ideas there), but also the temporal autocorrelation prevalent in the palaeo data which would need to be accounted for to yield appropriate p-values. If you do address these issues and use path analysis, I for one would be very interested to here how you got on as this would be a very useful contribution to the palaeolimnological literature! Wow, this got long! Anyway, hopefully some of the above will be of use and interest, and should you wish to chat about this off-list some more (I'm certainly very interested if you use path analysis for this), then please do so. (Note I am still sending via my old UCL address as I have moved to Canada and U Regina, contact details in my email signature.) All the best, Gavin On Wed, 2013-03-06 at 22:12 -0500, Jay Kerns wrote:
This reply is spectacular; thank you very much. There's a lot to think about, and I'll be chewing on it for a while. Thank you for the references and thoughtful comments. Regards, Jay