capscale() for PCoA-CDA
On 3/12/09 23:54 PM, "gabriel singer" <gabriel.singer at univie.ac.at> wrote:
Hi everybody, Anybody has used capscale() in package vegan to compute a PCoA-CDA as suggested by Anderson and Willis 2003 (Ecology 84: 511 ff) using one or more factors as "predictors"? Then I wonder about: *) How to interpret interactions of factors? Why are interactions (specified as "~factor1*factor2" in the function call) shown as continuous predictors (using arrows) in the plot function? Wouldn?t centroids for all cells in the design be more appropriate? Aren?t factorial interactions in a CDA setting more or less meaningless?
Internally capscale() uses constrasts of variables, and they are treated as continuous variables and shown as arrows in plots. However, if the constrasts correspond to simple factors, they are not drawn but their centroids are shown. For ordered factors you get both centroids and the arrows. The interactions of contrasts cannot be shown as simple class means and therefore they are drawn as arrows. The simple centroids are not appropriate, but you should have centroids of all combinations of class levels of interacting factors. If you think that factorial interactions in *** (what is CDA?) are meaningless, why do you want to use them? I wouldn't say they are meaningless, because that depends on your meaning. Often they are difficult to interpret, but that's another issue.
*) How to get classification statistics? And how to efficiently run a "leave 1 out" classification analysis? I thought of manually writing code that checks for the closest centroid. Would it be appropriate to use Euclidean distance as a criterion for this since it happens in PCo space? Probably there are more efficient functions which I do not know of, yet,... for example a function that allows extraction of distances of all objects to all centroids?
There is no such thing. Contributed code will be reviewed for inclusion into vegan.
*) Is the application of capscale on a Euclidean distance matrix equivalent to a classical DFA aka CDA on the original data - or am I completely wrong with this idea?
No, it isn't equal to "DFA aka CDA". Perhaps... Depends on what are DFA and CDA. With Euclidean distances, capscale() is equivalent to redundancy analysis (RDA). Guessing that "DFA aka CDA" are discriminant analysis, RDA is not equal to them. The major difference is that RDA uses no information about scatter of points with respect to the class centroids, but it only uses class centroids. The RDA tries to maximize the distances among class centroids, but it doesn't try to maximize the separation of points of different classes. The methods are very different although the results may have some similarities. This is connected to the previous question: because RDA (that is in the heart of capscale()) does not try to optimize in classification, there is no classification statistic to be optimized. That should be estimated independently of the analysis and after the analysis, and there are no functions for the purpose in vegan.
*) Given only one factor as a "predictor", I guess using permutest() or anova() on an object resulting from capscale is completely equivalent to a direct application of adonis()? Correct?
Have you tried this? After trying, you could tell us if this is true. I wouldn't expect this. The results may not be completely different, but internally the methods are pretty different, and when I tried with the same random number seed and hence same permutations, the results were not identical. Cheers, Jari