Skip to content

CCA vs NMDS and ordisurf

12 messages · Aurélie Boissezon, Pierre THIRIET, Jari Oksanen +2 more

#
Dear Aur?lie,

About the dissimilarity measures and data you used:
Bray-curtis is usually the most appropriate, on raw 
abundance/biomass/cover data, or square root/log transformed. So why do 
you Hellinger transform before? This transformation is dedicated to be 
used with euclidean distance, and resulted ordinations (PCA or RDA) have 
a distinct meaning than PCoA or CAP/db-RDA (with bray-curtis) because 
joint abscence are included in first cases and excluded in the latter. 
See picture below from Anderson et al 2011 Navigating the multiple 
meanings of b diversity: a roadmap for the practicing ecologist



So, if you want do constrained ordinations (constrained by "drought 
disturbance gradient", I guess), I would suggest dbRDA (vegan::capscale) 
with bray curtis, or RDA on Hellinger transformed data, depending on 
what you want to emphasis.
For unconstrained ordinations, this will be respectively PCoA and PCA.

Pay attention in using NMDS. As you said,  it is rank-based, this is why 
fitting environmental vectors to NMDS biplot is not so appropriate, 
despite widely done. I don't see the problem about ordisurf and PCoA or 
CAP: Ordisurf enables you to fit environnemental variables that have 
non-linear relationships with PC of distance based ordinations.

If you use bray-curtis, I would suggest to use distance among group 
centroids instead of computing averages over groups followed by bray-curtis

About hypotheses testing (in capscale or adonis for instance), pay 
attention to the longitudinal nature of your data. Some questions about 
repeated measure and adonis are already in R-SIG-ECO archives, have a alook.

I guess you are interested in identifying the species which are the most 
responsible of community change over drought disturbance gradien?!
If yes, I think an appropriate way could be: a dbRDA (capscale) with 
bray curtis on square root transformed cover data (or not, depends if 
you have few predominant species that might mask the others) , and 
"drought disturbance gradient" as a continuous constraint. Then, you 
could overlay vectors of correlations between species cover and CAP1 axe 
(i.e. in vegan: scores(your.capscale, dis="sp", scaling=-2, const = 
sqrt(nrow(your.cover.data.matrix)-1),choices=1).

I hope my english is at least understandable, and that my answer helped you.

Cheers,
Pierre



Le 18/04/2013 13:31, Aur?lie Boissezon a ?crit :
#
Hello folks,

Only one point here:
On 18/04/2013, at 15:52 PM, Pierre THIRIET wrote:
This is not true. I have seen this sometimes in Internet, but this really is not true: The NMDS ordination space is strictly *metric*. In vegan it is even strictly *Euclidean*. So it is absolutely correct to fit vectors to NMDS ordination. (In MASS::isoMDS you can also have any Minkowski metric, but only Euclidean or Minkowski with exponent=2 is allowed in vegan even with isoMDS.)

What is non-metric is the monotonic regression from *metric* ordination to any dissimilarity measure. So NMDS finds metric solution from any dissimilarity measure.

I
#
Dear all,

Thanks for your help. It took me some time to replace all informations together in my little bit less confused brain. Maybe I should give some explanations about the context of my study and the purpose to go further with this discussion.
Theory:
The objective of my phD thesis is to improve scientific knowledge about the ecology of a very particular family of aquatic plants : the charophytes. I choose to study closely the response of species (cover and life cycle) to fine-scale gradients. The study site is a hotspot for aquatic plants (Rey-Boissezon and Auderset Joye, 2012. Arch. Sciences. in press) and in particular for charophytes species --> that's why I made this longitudinal research on this waterbody.
The main purpose is to understand how disturbance gradient affect the composition of the macrophyte community, in particular the distribution of Charophytes ("V3" mission in Anderson et al 2011).
Practical: 
I want to ignore double zero because there is no reason to consider that double zeros indicate similarity.--> avoid euclidean-distance based method such as PCA and RDA
The succession of a high number of species generated numerous zero in my species dataset (long environmental gradient). --> one more argument against RDA 
 Finally vegetation was well sampled so rarest species were truly rare in the water body. Nevertheless I am not particularly interest by those rare species so I deleted them before multivariate analysis. 

For all these reasons, I firstly I tried CCA ordination. But I did not tried dbRDA. Should I on the basis of my practical limits? Would it be really best than CCA ? I guess I have to try following Pierre's method. The main positive point for dbRDA is that I can use any dissimilarity matrix (if I understand well), hellinger or bray curtis for example.

Why not explore unconstrained ordination methods and went further with NMDS ("V2" mission in Anderson et al 2011)? 
 I understood that I was wrong when using Bray-Curtis distance on hellinger transformed data before NMDS, I have to choose. But that I am right when superimposing vector or gam surface on NMDS ordinations. 
But could someone explained briefly how to interpret outputs? in particular the position of each species on surface, the "r2-adjusted" and "deviance explained" by gam...

At last but not least, I am not sure that the longitudinal nature of my dataset is really a problem. Do you mean autocorrelation problems might happened ?

Cheers,

Aur?lie


-----------------------------------------------------------------------
Aur?lie Rey-Boissezon
Ph-D Student
University of Geneva
Section of Earth and Environmental Sciences - Institute F.-A. Forel
Aquatic Ecology Group
Uni Rondeau
Site de Battelle - B?timent D
7, route de Drize - 1227 Carouge
Geneva
Switzerland
Tel. 0041 (0) 22379 04 88

Aurelie.Boissezon at unige.ch
http://leba.unige.ch/team/aboissezon.html

________________________________________
De : r-sig-ecology-bounces at r-project.org [r-sig-ecology-bounces at r-project.org] de la part de Pierre THIRIET [pierre.d.thiriet at gmail.com]
Date d'envoi : jeudi 18 avril 2013 14:52
? : r-sig-ecology at r-project.org
Objet : Re: [R-sig-eco] CCA vs NMDS and ordisurf

Dear Aur?lie,

About the dissimilarity measures and data you used:
Bray-curtis is usually the most appropriate, on raw
abundance/biomass/cover data, or square root/log transformed. So why do
you Hellinger transform before? This transformation is dedicated to be
used with euclidean distance, and resulted ordinations (PCA or RDA) have
a distinct meaning than PCoA or CAP/db-RDA (with bray-curtis) because
joint abscence are included in first cases and excluded in the latter.
See picture below from Anderson et al 2011 Navigating the multiple
meanings of b diversity: a roadmap for the practicing ecologist



So, if you want do constrained ordinations (constrained by "drought
disturbance gradient", I guess), I would suggest dbRDA (vegan::capscale)
with bray curtis, or RDA on Hellinger transformed data, depending on
what you want to emphasis.
For unconstrained ordinations, this will be respectively PCoA and PCA.

Pay attention in using NMDS. As you said,  it is rank-based, this is why
fitting environmental vectors to NMDS biplot is not so appropriate,
despite widely done. I don't see the problem about ordisurf and PCoA or
CAP: Ordisurf enables you to fit environnemental variables that have
non-linear relationships with PC of distance based ordinations.

If you use bray-curtis, I would suggest to use distance among group
centroids instead of computing averages over groups followed by bray-curtis

About hypotheses testing (in capscale or adonis for instance), pay
attention to the longitudinal nature of your data. Some questions about
repeated measure and adonis are already in R-SIG-ECO archives, have a alook.

I guess you are interested in identifying the species which are the most
responsible of community change over drought disturbance gradien?!
If yes, I think an appropriate way could be: a dbRDA (capscale) with
bray curtis on square root transformed cover data (or not, depends if
you have few predominant species that might mask the others) , and
"drought disturbance gradient" as a continuous constraint. Then, you
could overlay vectors of correlations between species cover and CAP1 axe
(i.e. in vegan: scores(your.capscale, dis="sp", scaling=-2, const =
sqrt(nrow(your.cover.data.matrix)-1),choices=1).

I hope my english is at least understandable, and that my answer helped you.

Cheers,
Pierre



Le 18/04/2013 13:31, Aur?lie Boissezon a ?crit :
_______________________________________________
R-sig-ecology mailing list
R-sig-ecology at r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
#
A contrary view in-lined below:
On Fri, 2013-04-19 at 15:19 +0200, Fran?ois Gillet wrote:
<snip />
I think you need to justify the "more powerful" there! :-) I see uses
for both the constrained and unconstrained methods here. A comparison,
especially if your do PCA vs RDA (with Hellinger or similar
transformation) or PCoA vs capscale (with any distance measure) allows
you to investigate the degree to which your constraints relate to the
major patterns in the species responses.

These are complementary approaches and one would do well to use them
both.
You can, but the axes are still formed through linear functions of the
constraints. The constrained methods don't fit non-linear functions
(well you can introduce quadratic terms...) in the constraints.

I really don't see why this has to be an either/or situation.

G
1 day later
#
I would say that it *is* important, in general. However, you don't say
if you retried running `monoMDS` on the Hellinger transformed data
(without the Bray-Curtis metric - you should use Euclidean with
Hellinger transformation)? If you didn't try rerunning with out
Bray-Curtis and see if it converges. Otherwise, try many more iterations
and get vegan to start monoMDS from the best solution from the first set
of runs.

See `?metaMDS for details.

G
On Mon, 2013-04-22 at 08:26 +0000, Aur?lie Boissezon wrote:

  
    
#
I also suggest (like I have suggested before) that you run metaMDS with argument plot = TRUE. The convergence criteria in metaMDS are pretty stringent, but with plot argument you can see how different the solutions are. Two most typical non-convergence cases are that 

(1) most points are stable, but there are a some outliers that don't find their place in this universe, and

(2) your data need more dimensions and you should increase 'k'.

Then you should also check the stressplot( ). If the fit line shoots right up at the maximum observed dissimilarity, you may need to turn on 'noshare' argument in metaMDS to trigger step across dissimilarities. We claim that this rarely necessary with the monoMDS engine we use currently, but sometimes it is needed.

Without hands on your data it is difficult to guess more.

Cheers, Jari Oksanen


Sent from my iPad
On 22.4.2013, at 22.31, "Gavin Simpson" <gavin.simpson at ucl.ac.uk> wrote: