CCA vs NMDS and ordisurf

An embedded and charset-unspecified text was scrubbed...
Name: not available
URL: <https://stat.ethz.ch/pipermail/r-sig-ecology/attachments/20130418/d6215dba/attachment.pl>
Dear Aur?lie,

About the dissimilarity measures and data you used:
Bray-curtis is usually the most appropriate, on raw 
abundance/biomass/cover data, or square root/log transformed. So why do 
you Hellinger transform before? This transformation is dedicated to be 
used with euclidean distance, and resulted ordinations (PCA or RDA) have 
a distinct meaning than PCoA or CAP/db-RDA (with bray-curtis) because 
joint abscence are included in first cases and excluded in the latter. 
See picture below from Anderson et al 2011 Navigating the multiple 
meanings of b diversity: a roadmap for the practicing ecologist

So, if you want do constrained ordinations (constrained by "drought 
disturbance gradient", I guess), I would suggest dbRDA (vegan::capscale) 
with bray curtis, or RDA on Hellinger transformed data, depending on 
what you want to emphasis.
For unconstrained ordinations, this will be respectively PCoA and PCA.

Pay attention in using NMDS. As you said,  it is rank-based, this is why 
fitting environmental vectors to NMDS biplot is not so appropriate, 
despite widely done. I don't see the problem about ordisurf and PCoA or 
CAP: Ordisurf enables you to fit environnemental variables that have 
non-linear relationships with PC of distance based ordinations.

If you use bray-curtis, I would suggest to use distance among group 
centroids instead of computing averages over groups followed by bray-curtis

About hypotheses testing (in capscale or adonis for instance), pay 
attention to the longitudinal nature of your data. Some questions about 
repeated measure and adonis are already in R-SIG-ECO archives, have a alook.

I guess you are interested in identifying the species which are the most 
responsible of community change over drought disturbance gradien?!
If yes, I think an appropriate way could be: a dbRDA (capscale) with 
bray curtis on square root transformed cover data (or not, depends if 
you have few predominant species that might mask the others) , and 
"drought disturbance gradient" as a continuous constraint. Then, you 
could overlay vectors of correlations between species cover and CAP1 axe 
(i.e. in vegan: scores(your.capscale, dis="sp", scaling=-2, const = 
sqrt(nrow(your.cover.data.matrix)-1),choices=1).

I hope my english is at least understandable, and that my answer helped you.

Cheers,
Pierre

Le 18/04/2013 13:31, Aur?lie Boissezon a ?crit :
Hi everybody,

I have some questions about ordination analysis and interpretation of ordisurf() output. So huge thanks to people who will help me to clean up my confused brain.
So I am working on cover data of aquatic plants (%). I made 7 quadrat sampling between 2009 and 2012 in a semi permanent shallow pond (n=1200 approximately without empty quadrat). Due to fluctuating water regime and small topographic variations, my sampling units are distributed along a gradient of inundation conditions from permanently wet to frequently dry. Clearly the vegetation responded to water level condition occurring the previous year. Community following several years of high levels was very different from the one occuring the year after a severe drought of the waterbody (a lot of charophytes, pionneer species). I quantified this "drought disturbance gradient" by calculating when (which season?), and for how many days each quadrat dried before each field sampling.
My purpose is to explore the relationship between the composition of the community and those "drought indexes". And in particular to highlight the succession of species along the gradients.
My first reflex was to implement a CCA but someone tell me to explore unconstrained approach and in particular NMDS.
The CCA ordination shows a strong arch effect but is highly significant and perfectly ecologically interpretable and congruent with my field observations. To summarize submerged species are separated from helophytes species by duration of drought during growing season (submerged species need water from winter to summer). And submerged species succeeded each other along a gradient of duration of drought at the end of the growth season, in autumn.
But to see if I had similar results when looking at the whole variation of the community data set and when using a more suitable distance measure, I run a NMDS on Hellinger-transformed data based on Bray-Curtis distances.
With NMDS I didn't reach a "convergent solution" even after setting stricter criteria maxit and sratmax. Nevertheless the stress is acceptable (8 with k=3 ) and the species are ordinated similarly to the CCA. I implement the same analysis on a simplified version of my data set by averaging the cover of species by date, by depth clusters (10 centiles) and by area of the lake leading to 131 observations instead of 1200 quadrats initially (which is very large). Here the nmds reached quickly a convergent solution (after 20 or 50 runs) and gave always a similar ordination of species.
So is it important not to reach a convergent solution with NMDS in my case?

I tried to overlay environmental informations on NMDS ordination using envfit function and then ordisurf which allows the environmental parameter to vary non linearly in the ordination space (on the contrary to CCA). I am really satisfied with graphical outputs  which are ecologically meaningfull but I am afraid to misinterprete them.
In ecological studies we are used to explain the distribution of species with environmental/ explanatory variables. Here is it the same? If I understand well, ordisurf implement a 2d surface gam of the explanatory/environmnetal variable with the scores of sites ordinated in the n dimensions of the nmds..... that means that the explanatory variable become the response variable.
Thus can I interprete the position of species in the ordination space with GAM surface resulting from ordisurf???? Like species X is present in sites never dried during spring, but between 10 and 20 days during autumn...etc....
I think yes since relev?s were ordinated on the basis of the structure of the macrophytes community...but I am not so sure!

Thanks a lot for your help!
Best regards,

Aur?lie

-----------------------------------------------------------------------
Aur?lie Rey-Boissezon
Ph-D Student
University of Geneva
Section of Earth and Environmental Sciences - Institute F.-A. Forel
Aquatic Ecology Group
Uni Rondeau
Site de Battelle - B?timent D
7, route de Drize - 1227 Carouge
Geneva
Switzerland
Tel. 0041 (0) 22379 04 88

Aurelie.Boissezon at unige.ch
http://leba.unige.ch/team/aboissezon.html

	[[alternative HTML version deleted]]

_______________________________________________
R-sig-ecology mailing list
R-sig-ecology at r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
Hello folks,

Only one point here:

Pay attention in using NMDS. As you said,  it is rank-based, this is why fitting environmental vectors to NMDS biplot is not so appropriate, despite widely done. I don't see the problem about ordisurf and PCoA or CAP: Ordisurf enables you to fit environnemental variables that have non-linear relationships with PC of distance based ordinations.
This is not true. I have seen this sometimes in Internet, but this really is not true: The NMDS ordination space is strictly *metric*. In vegan it is even strictly *Euclidean*. So it is absolutely correct to fit vectors to NMDS ordination. (In MASS::isoMDS you can also have any Minkowski metric, but only Euclidean or Minkowski with exponent=2 is allowed in vegan even with isoMDS.)

What is non-metric is the monotonic regression from *metric* ordination to any dissimilarity measure. So NMDS finds metric solution from any dissimilarity measure.

I
Jari Oksanen, Dept Biology, Univ Oulu, 90014 Finland
jari.oksanen at oulu.fi, Ph. +358 400 408593, http://cc.oulu.fi/~jarioksa
An embedded and charset-unspecified text was scrubbed...
Name: not available
URL: <https://stat.ethz.ch/pipermail/r-sig-ecology/attachments/20130419/83edb24b/attachment.pl>
An embedded and charset-unspecified text was scrubbed...
Name: not available
URL: <https://stat.ethz.ch/pipermail/r-sig-ecology/attachments/20130419/8c1256f6/attachment.pl>
Dear all,

Thanks for your help. It took me some time to replace all informations together in my little bit less confused brain. Maybe I should give some explanations about the context of my study and the purpose to go further with this discussion.
Theory:
The objective of my phD thesis is to improve scientific knowledge about the ecology of a very particular family of aquatic plants : the charophytes. I choose to study closely the response of species (cover and life cycle) to fine-scale gradients. The study site is a hotspot for aquatic plants (Rey-Boissezon and Auderset Joye, 2012. Arch. Sciences. in press) and in particular for charophytes species --> that's why I made this longitudinal research on this waterbody.
The main purpose is to understand how disturbance gradient affect the composition of the macrophyte community, in particular the distribution of Charophytes ("V3" mission in Anderson et al 2011).
Practical: 
I want to ignore double zero because there is no reason to consider that double zeros indicate similarity.--> avoid euclidean-distance based method such as PCA and RDA
The succession of a high number of species generated numerous zero in my species dataset (long environmental gradient). --> one more argument against RDA 
 Finally vegetation was well sampled so rarest species were truly rare in the water body. Nevertheless I am not particularly interest by those rare species so I deleted them before multivariate analysis. 

For all these reasons, I firstly I tried CCA ordination. But I did not tried dbRDA. Should I on the basis of my practical limits? Would it be really best than CCA ? I guess I have to try following Pierre's method. The main positive point for dbRDA is that I can use any dissimilarity matrix (if I understand well), hellinger or bray curtis for example.

Why not explore unconstrained ordination methods and went further with NMDS ("V2" mission in Anderson et al 2011)? 
 I understood that I was wrong when using Bray-Curtis distance on hellinger transformed data before NMDS, I have to choose. But that I am right when superimposing vector or gam surface on NMDS ordinations. 
But could someone explained briefly how to interpret outputs? in particular the position of each species on surface, the "r2-adjusted" and "deviance explained" by gam...

At last but not least, I am not sure that the longitudinal nature of my dataset is really a problem. Do you mean autocorrelation problems might happened ?

Cheers,

Aur?lie

-----------------------------------------------------------------------
Aur?lie Rey-Boissezon
Ph-D Student
University of Geneva
Section of Earth and Environmental Sciences - Institute F.-A. Forel
Aquatic Ecology Group
Uni Rondeau
Site de Battelle - B?timent D
7, route de Drize - 1227 Carouge
Geneva
Switzerland
Tel. 0041 (0) 22379 04 88

Aurelie.Boissezon at unige.ch
http://leba.unige.ch/team/aboissezon.html

________________________________________
De : r-sig-ecology-bounces at r-project.org [r-sig-ecology-bounces at r-project.org] de la part de Pierre THIRIET [pierre.d.thiriet at gmail.com]
Date d'envoi : jeudi 18 avril 2013 14:52
? : r-sig-ecology at r-project.org
Objet : Re: [R-sig-eco] CCA vs NMDS and ordisurf

Dear Aur?lie,

About the dissimilarity measures and data you used:
Bray-curtis is usually the most appropriate, on raw
abundance/biomass/cover data, or square root/log transformed. So why do
you Hellinger transform before? This transformation is dedicated to be
used with euclidean distance, and resulted ordinations (PCA or RDA) have
a distinct meaning than PCoA or CAP/db-RDA (with bray-curtis) because
joint abscence are included in first cases and excluded in the latter.
See picture below from Anderson et al 2011 Navigating the multiple
meanings of b diversity: a roadmap for the practicing ecologist

So, if you want do constrained ordinations (constrained by "drought
disturbance gradient", I guess), I would suggest dbRDA (vegan::capscale)
with bray curtis, or RDA on Hellinger transformed data, depending on
what you want to emphasis.
For unconstrained ordinations, this will be respectively PCoA and PCA.

Pay attention in using NMDS. As you said,  it is rank-based, this is why
fitting environmental vectors to NMDS biplot is not so appropriate,
despite widely done. I don't see the problem about ordisurf and PCoA or
CAP: Ordisurf enables you to fit environnemental variables that have
non-linear relationships with PC of distance based ordinations.

If you use bray-curtis, I would suggest to use distance among group
centroids instead of computing averages over groups followed by bray-curtis

About hypotheses testing (in capscale or adonis for instance), pay
attention to the longitudinal nature of your data. Some questions about
repeated measure and adonis are already in R-SIG-ECO archives, have a alook.

I guess you are interested in identifying the species which are the most
responsible of community change over drought disturbance gradien?!
If yes, I think an appropriate way could be: a dbRDA (capscale) with
bray curtis on square root transformed cover data (or not, depends if
you have few predominant species that might mask the others) , and
"drought disturbance gradient" as a continuous constraint. Then, you
could overlay vectors of correlations between species cover and CAP1 axe
(i.e. in vegan: scores(your.capscale, dis="sp", scaling=-2, const =
sqrt(nrow(your.cover.data.matrix)-1),choices=1).

I hope my english is at least understandable, and that my answer helped you.

Cheers,
Pierre

Le 18/04/2013 13:31, Aur?lie Boissezon a ?crit :
Hi everybody,

I have some questions about ordination analysis and interpretation of ordisurf() output. So huge thanks to people who will help me to clean up my confused brain.
So I am working on cover data of aquatic plants (%). I made 7 quadrat sampling between 2009 and 2012 in a semi permanent shallow pond (n=1200 approximately without empty quadrat). Due to fluctuating water regime and small topographic variations, my sampling units are distributed along a gradient of inundation conditions from permanently wet to frequently dry. Clearly the vegetation responded to water level condition occurring the previous year. Community following several years of high levels was very different from the one occuring the year after a severe drought of the waterbody (a lot of charophytes, pionneer species). I quantified this "drought disturbance gradient" by calculating when (which season?), and for how many days each quadrat dried before each field sampling.
My purpose is to explore the relationship between the composition of the community and those "drought indexes". And in particular to highlight the succession of species along the gradients.
My first reflex was to implement a CCA but someone tell me to explore unconstrained approach and in particular NMDS.
The CCA ordination shows a strong arch effect but is highly significant and perfectly ecologically interpretable and congruent with my field observations. To summarize submerged species are separated from helophytes species by duration of drought during growing season (submerged species need water from winter to summer). And submerged species succeeded each other along a gradient of duration of drought at the end of the growth season, in autumn.
But to see if I had similar results when looking at the whole variation of the community data set and when using a more suitable distance measure, I run a NMDS on Hellinger-transformed data based on Bray-Curtis distances.
With NMDS I didn't reach a "convergent solution" even after setting stricter criteria maxit and sratmax. Nevertheless the stress is acceptable (8 with k=3 ) and the species are ordinated similarly to the CCA. I implement the same analysis on a simplified version of my data set by averaging the cover of species by date, by depth clusters (10 centiles) and by area of the lake leading to 131 observations instead of 1200 quadrats initially (which is very large). Here the nmds reached quickly a convergent solution (after 20 or 50 runs) and gave always a similar ordination of species.
So is it important not to reach a convergent solution with NMDS in my case?

I tried to overlay environmental informations on NMDS ordination using envfit function and then ordisurf which allows the environmental parameter to vary non linearly in the ordination space (on the contrary to CCA). I am really satisfied with graphical outputs  which are ecologically meaningfull but I am afraid to misinterprete them.
In ecological studies we are used to explain the distribution of species with environmental/ explanatory variables. Here is it the same? If I understand well, ordisurf implement a 2d surface gam of the explanatory/environmnetal variable with the scores of sites ordinated in the n dimensions of the nmds..... that means that the explanatory variable become the response variable.
Thus can I interprete the position of species in the ordination space with GAM surface resulting from ordisurf???? Like species X is present in sites never dried during spring, but between 10 and 20 days during autumn...etc....
I think yes since relev?s were ordinated on the basis of the structure of the macrophytes community...but I am not so sure!

Thanks a lot for your help!
Best regards,

Aur?lie

-----------------------------------------------------------------------
Aur?lie Rey-Boissezon
Ph-D Student
University of Geneva
Section of Earth and Environmental Sciences - Institute F.-A. Forel
Aquatic Ecology Group
Uni Rondeau
Site de Battelle - B?timent D
7, route de Drize - 1227 Carouge
Geneva
Switzerland
Tel. 0041 (0) 22379 04 88

Aurelie.Boissezon at unige.ch
http://leba.unige.ch/team/aboissezon.html

      [[alternative HTML version deleted]]

_______________________________________________
R-sig-ecology mailing list
R-sig-ecology at r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
_______________________________________________
R-sig-ecology mailing list
R-sig-ecology at r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
An embedded and charset-unspecified text was scrubbed...
Name: not available
URL: <https://stat.ethz.ch/pipermail/r-sig-ecology/attachments/20130419/9d662364/attachment.pl>
A contrary view in-lined below:
A lot of questions, some responses below...
<snip />
Why not explore unconstrained ordination methods and went further with
NMDS ("V2" mission in Anderson et al 2011)?

Just because your purpose is to explain community structure by
environmental variables (a regression-oriented question). Direct gradient
analysis (especially with RDA and adjusted R-square) is in this case more
powerful than indirect gradient analysis (from NMDS or any other
unconstrained ordination).
I think you need to justify the "more powerful" there! :-) I see uses
for both the constrained and unconstrained methods here. A comparison,
especially if your do PCA vs RDA (with Hellinger or similar
transformation) or PCoA vs capscale (with any distance measure) allows
you to investigate the degree to which your constraints relate to the
major patterns in the species responses.

These are complementary approaches and one would do well to use them
both.
 I understood that I was wrong when using Bray-Curtis distance on
hellinger transformed data before NMDS, I have to choose. But that I am
right when superimposing vector or gam surface on NMDS ordinations.

That's right, but you can fit a GAM model on RDA results as well!
You can, but the axes are still formed through linear functions of the
constraints. The constrained methods don't fit non-linear functions
(well you can introduce quadratic terms...) in the constraints.

I really don't see why this has to be an either/or situation.

G
Cheers,

Franois

-------------------------------------------------------------------------------
Prof. *Franois Gillet*
Universit de Franche-Comt - CNRS
UMR 6249 Chrono-environnement
UFR Sciences et Techniques
16, Route de Gray
F-25030 Besanon cedex
France
http://chrono-environnement.univ-fcomte.fr/
http://chrono-environnement.univ-fcomte.fr/spip.php?article530
Phone: +33 (0)3 81 66 62 81
iPhone: +33 (0)7 88 37 07 76
Location: La Bouloie, Bt. Propdeutique, *-114L*
-------------------------------------------------------------------------------
Editor of* Plant Ecology and Evolution*
http://www.plecevo.eu
-------------------------------------------------------------------------------
*
***

	[[alternative HTML version deleted]]

_______________________________________________
R-sig-ecology mailing list
R-sig-ecology at r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
An embedded and charset-unspecified text was scrubbed...
Name: not available
URL: <https://stat.ethz.ch/pipermail/r-sig-ecology/attachments/20130420/777ecad7/attachment.pl>
An embedded and charset-unspecified text was scrubbed...
Name: not available
URL: <https://stat.ethz.ch/pipermail/r-sig-ecology/attachments/20130422/06d13ccf/attachment.pl>
I would say that it *is* important, in general. However, you don't say
if you retried running `monoMDS` on the Hellinger transformed data
(without the Bray-Curtis metric - you should use Euclidean with
Hellinger transformation)? If you didn't try rerunning with out
Bray-Curtis and see if it converges. Otherwise, try many more iterations
and get vegan to start monoMDS from the best solution from the first set
of runs.

See `?metaMDS for details.

G
Hello everybody!
I didn't imagine that my questions will lead to such a debate among researchers :) . It helps me to get ready for future reviewers' comments.  ;)
Just a question still opened about NMDS (Gavin?):
Is it important to reach a convergent solution? since the "best" solution ordinate species always in similar way? Because as I said even with stricter criteria the analysis don't reach a convergent solution.

Best regards,

Aur?lie

-----------------------------------------------------------------------
Aur?lie Rey-Boissezon
Ph-D Student
University of Geneva
Section of Earth and Environmental Sciences - Institute F.-A. Forel
Aquatic Ecology Group
Uni Rondeau
Site de Battelle - B?timent D
7, route de Drize - 1227 Carouge
Geneva
Switzerland
Tel. 0041 (0) 22379 04 88

Aurelie.Boissezon at unige.ch
http://leba.unige.ch/team/aboissezon.html
________________________________
De : fgillet3 at gmail.com [fgillet3 at gmail.com] de la part de Fran?ois Gillet [francois.gillet at univ-fcomte.fr]
Date d'envoi : samedi 20 avril 2013 10:59
? : Gavin Simpson
Cc: Aur?lie Boissezon; r-sig-ecology at r-project.org
Objet : Re: [R-sig-eco] RE : CCA vs NMDS and ordisurf

2013/4/19 Gavin Simpson <gavin.simpson at ucl.ac.uk<mailto:gavin.simpson at ucl.ac.uk>>
I really don't see why this has to be an either/or situation.

I fully agree: direct and indirect gradient analyses are complementary! Sorry for not having stressed that in my short answers...

Fran?ois

%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
 Dr. Gavin Simpson             [t] +44 (0)20 7679 0522
 ECRC, UCL Geography,          [f] +44 (0)20 7679 0565
 Pearson Building,             [e] gavin.simpsonATNOSPAMucl.ac.uk
 Gower Street, London          [w] http://www.ucl.ac.uk/~ucfagls/
 UK. WC1E 6BT.                 [w] http://www.freshwaters.org.uk
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
I also suggest (like I have suggested before) that you run metaMDS with argument plot = TRUE. The convergence criteria in metaMDS are pretty stringent, but with plot argument you can see how different the solutions are. Two most typical non-convergence cases are that 

(1) most points are stable, but there are a some outliers that don't find their place in this universe, and

(2) your data need more dimensions and you should increase 'k'.

Then you should also check the stressplot( ). If the fit line shoots right up at the maximum observed dissimilarity, you may need to turn on 'noshare' argument in metaMDS to trigger step across dissimilarities. We claim that this rarely necessary with the monoMDS engine we use currently, but sometimes it is needed.

Without hands on your data it is difficult to guess more.

Cheers, Jari Oksanen

Sent from my iPad

I would say that it *is* important, in general. However, you don't say
if you retried running `monoMDS` on the Hellinger transformed data
(without the Bray-Curtis metric - you should use Euclidean with
Hellinger transformation)? If you didn't try rerunning with out
Bray-Curtis and see if it converges. Otherwise, try many more iterations
and get vegan to start monoMDS from the best solution from the first set
of runs.

See `?metaMDS for details.

G

On Mon, 2013-04-22 at 08:26 +0000, Aur?lie Boissezon wrote:
Hello everybody!
I didn't imagine that my questions will lead to such a debate among researchers :) . It helps me to get ready for future reviewers' comments.  ;)
Just a question still opened about NMDS (Gavin?):
Is it important to reach a convergent solution? since the "best" solution ordinate species always in similar way? Because as I said even with stricter criteria the analysis don't reach a convergent solution.

Best regards,

Aur?lie

-----------------------------------------------------------------------
Aur?lie Rey-Boissezon
Ph-D Student
University of Geneva
Section of Earth and Environmental Sciences - Institute F.-A. Forel
Aquatic Ecology Group
Uni Rondeau
Site de Battelle - B?timent D
7, route de Drize - 1227 Carouge
Geneva
Switzerland
Tel. 0041 (0) 22379 04 88

Aurelie.Boissezon at unige.ch
http://leba.unige.ch/team/aboissezon.html
________________________________
De : fgillet3 at gmail.com [fgillet3 at gmail.com] de la part de Fran?ois Gillet [francois.gillet at univ-fcomte.fr]
Date d'envoi : samedi 20 avril 2013 10:59
? : Gavin Simpson
Cc: Aur?lie Boissezon; r-sig-ecology at r-project.org
Objet : Re: [R-sig-eco] RE : CCA vs NMDS and ordisurf

2013/4/19 Gavin Simpson <gavin.simpson at ucl.ac.uk<mailto:gavin.simpson at ucl.ac.uk>>
I really don't see why this has to be an either/or situation.

I fully agree: direct and indirect gradient analyses are complementary! Sorry for not having stressed that in my short answers...

Fran?ois

-- 
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
Dr. Gavin Simpson             [t] +44 (0)20 7679 0522
ECRC, UCL Geography,          [f] +44 (0)20 7679 0565
Pearson Building,             [e] gavin.simpsonATNOSPAMucl.ac.uk
Gower Street, London          [w] http://www.ucl.ac.uk/~ucfagls/
UK. WC1E 6BT.                 [w] http://www.freshwaters.org.uk
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%

_______________________________________________
R-sig-ecology mailing list
R-sig-ecology at r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology