capscale() for PCoA-CDA - R-SIG-ecology

Thu, Dec 3, 2009 1:54 PM #

Hi everybody,

Anybody has used capscale() in package vegan to compute a PCoA-CDA as 
suggested by Anderson and Willis 2003 (Ecology 84: 511 ff) using one or 
more factors as "predictors"?

Then I wonder about:

*) How to interpret interactions of factors? Why are interactions 
(specified as "~factor1*factor2" in the function call) shown as 
continuous predictors (using arrows) in the plot function? Wouldn?t 
centroids for all cells in the design be more appropriate? Aren?t 
factorial interactions in a CDA setting more or less meaningless?

*) How to get classification statistics? And how to efficiently run a 
"leave 1 out" classification analysis? I thought of manually writing 
code that checks for the closest centroid. Would it be appropriate to 
use Euclidean distance as a criterion for this since it happens in PCo 
space? Probably there are more efficient functions which I do not know 
of, yet,... for example a function that allows extraction of distances 
of all objects to all centroids?

*) Is the application of capscale on a Euclidean distance matrix 
equivalent to a classical DFA aka CDA on the original data - or am I 
completely wrong with this idea?

*) Given only one factor as a "predictor", I guess using permutest() or 
anova() on an object resulting from capscale is completely equivalent to 
a direct application of adonis()? Correct?

These are lots of questions at once and no code to play with, sorry... 
Thanks for any help!

Gabriel

Jari Oksanen

Thu, Dec 3, 2009 10:20 PM #

On 3/12/09 23:54 PM, "gabriel singer" <gabriel.singer at univie.ac.at> wrote:

Internally capscale() uses constrasts of variables, and they are treated as
continuous variables and shown as arrows in plots. However, if the
constrasts correspond to simple factors, they are not drawn but their
centroids are shown. For ordered factors you get both centroids and the
arrows. The interactions of contrasts cannot be shown as simple class means
and therefore they are drawn as arrows. The simple centroids are not
appropriate, but you should have centroids of all combinations of class
levels of interacting factors.

If you think that factorial interactions in *** (what is CDA?) are
meaningless, why do you want to use them?

I wouldn't say they are meaningless, because that depends on your meaning.
Often they are difficult to interpret, but that's another issue.

There is no such thing. Contributed code will be reviewed for inclusion into
vegan.

No, it isn't equal to "DFA aka CDA". Perhaps... Depends on what are DFA and
CDA. With Euclidean distances, capscale() is equivalent to redundancy
analysis (RDA). Guessing that "DFA aka CDA" are discriminant analysis, RDA
is not equal to them. The major difference is that RDA uses no information
about scatter of points with respect to the class centroids, but it only
uses class centroids. The RDA tries to maximize the distances among class
centroids, but it doesn't try to maximize the separation of points of
different classes. The methods are very different although the results may
have some similarities.

This is connected to the previous question: because RDA (that is in the
heart of capscale()) does not try to optimize in classification, there is no
classification statistic to be optimized. That should be estimated
independently of the analysis and after the analysis, and there are no
functions for the purpose in vegan.

Have you tried this? After trying, you could tell us if this is true. I
wouldn't expect this. The results may not be completely different, but
internally the methods are pretty different, and when I tried with the same
random number seed and hence same permutations, the results were not
identical.
 
Cheers, Jari

gabriel singer

Fri, Dec 4, 2009 5:02 AM #

Dear Jari and others,

I understand the arrows for interactions now, thanks.

I used CDA in the sense of Anderson and Willis 2003 (and others) as 
Canonical Disicriminant Analysis,
as such it is - at least to my understanding - equivalent to 
Discriminant Function Analyses.
When CDA aka DFA is used with 2 interacting factors, it will try to best 
separate groups and that
is *any groups*, and I can?t see why (and how) there should be 
preference given to any grouping
criterion (factor 1, factor 2 or both)... In the end a 4-level factor 
should be as good as
a 2*2 factorial combination. In this sense I used the word "meaningless".

In fact, capscale() results for a 1*4 constraint (1 factor, 4 levels) 
are identical with a 2*2 constraint.
However, centroids are at differnt positions (!), in fact centroids of 
all combinations of class levels are at
weird (wrong as I think) positions in the 2*2 case!?

Still, "interactions" finally make sense when interpreting the plot, 
that?s quite true.

Slightly confused now... Anderson and Willis (2003) describe PCoA on a 
dissimilarity structure, followed by
CDA or CCorA and call the procedure CAP (Canonical A of Principal 
Coordinates). I will call the latter two
approaches PCoA-CDA and PCoA-CCorA. Now, I get that CCorA differs from 
RDA mainly conceptually,
so there is not much (any?) difference between PCoA-CCorA and PCoA-RDA = 
capscale().
Now, is PCoA-CDA really equivalent to db-RDA (in the sense of Legendre and
Anderson 1999)?  I initially thought this would be the case. They both 
use a set of dummy variables to code
for the factor and treat these as continous predictors. A second thought 
tells me they can?t be the same. Then
maybe what?s left is only the term capscale() which is not the same as 
CAP in the case of PCoA-CDA...
Seems I am getting lost in the panoply of acronyms, sorry...

Well, the question was sort of aimed at what?s happening in the 
background, obviously that?s not the same
(though I don?t get how the two permutation tests exactly differ, I 
thought  - at least in the sample 1 factor case -
it?s basically permuting raw data and building a pseudo-F distribution). 
In my trials I got very similar
results (also same pseudo F - so I thought the test statistic has to be 
the same) and interpreted any
differences of the P-values as due to differences in the permutations.

Jari, thanks for the discussion!

Cheers, Gabriel