Skip to content

reception of (Vegan) envfit analysis by manuscript reviewers

8 messages · Gavin Simpson, Jari Oksanen, Alan Haynes +1 more

#
On Wed, 2012-05-09 at 15:51 -0600, Matt Bakker wrote:
Without further context for that quote and your manuscript to see how
you are using the method it is difficult to say whether you are doing
something silly or the reviewer is bone-headed.

I've had similar comments from reviewers about my use of the ordisurf()
function. In each case it was the reviewers' failure to understand the
methods applied that was the cause of the confusion.

As you provide little or no context I'll explain what envfit() does etc.

The idea goes back a long way (!) and is in my 1995 edition of Jongman
et al Data Analysis in Community and Landscape Ecology (Cambridge
University Press) though most likely was in 1987 version too. See
Section 5.4 of the Ordination chapter by Ter Braak in that book.

The idea is to find the direction (in the k-dimensional ordination
space) that has maximal correlation with an external variable.
Essentially, we have:

E(z_j) = b_0 + b_1x_1 + b_2x_2

where E(z_j) is the expectation (or mean, or fitted values) of the jth
external (environmental) variable, x_1 and x_2 are the "axis" scores in
ordination dimensions 1 and 2, and b_y are unknown regression
coefficients. This generalises to more than 2 dimensions or axes.

The biplot arrow drawn goes from (0,0) to (b_1, b_2).

You can see that the aim is to model or predict the values of the jth
environmental variable (z_j) as a linear combination of the "axis" or
site scores of the samples in the ordination space. Exactly the same
idea underlies the ordisurf() function except that we use a GAM and for
the right hand side of the equation multivariate splines are used which
allow a non-linear surface instead of a plane.

When applied to nMDS, if the nMDS provides a reasonable approximation to
the original dissimilarities, then envfit() will estimate and show the
strengths of the correlation and direction of maximal correlation
between the nMDS configuration and the jth enviromental variable. This
technique can be used to indicate if one or more environmental variables
are associated with differences between sites/samples as represented in
the nMDS ordination.

The big caveat is the implication that the correlation or relationship
between z_j and the ordination space is linear. ordisurf() allows you to
relax this assumption as we fit a potentially non-linear surface to the
ordination space instead of the plane that envfit() effectively produces
(though we show only the direction of change with the arrow).

So without seeing your manuscript or more context (and I'm not promising
to read it or comment more if you provide it) I would suggest that, *if*
you have applied nMDS and used envfit() correctly the combined analysis
*does* reflect the *linear* "relationship between the edaphic factor and
the Bray-Curtis distance", assuming of course that the nMDS has low
stress (i.e fits the original dissimilarities well).

In future, you should consider posting similar questions
(ecological/environmental) to the R-SIG-Ecology list instead of the main
R-Help list. I know Jari (lead developer of vegan and author of
envfit() ) has stopped regularly reading the main R-Help list and you
will get far more eyes familiar with these techniques on the
R-SIG-Ecology list.

I have taken the liberty of cc'ing this to the R-Sig-Ecology list so
others can comment.

HTH

G

  
    
#
On 10/05/2012, at 11:45 AM, Gavin Simpson wrote:

            
Hello,

The method was indeed in the first edition of ter Braak's book. However, the idea is much older. The vegan implementation was based on an unpublished report from the Bell Labs from 1970s (or earlier). In this Bell Labs memorandum the method was specifically suggested for NMDS. Vegan uses different algorithm, but the method is the same. The early history in vegan can be traced in  ORDNEWS correspondence from 2001 or so, but it is so old that I cannot find that message via this computer any longer.

Then about Bray-Curtis. The referee may be correct when writing that the fitted vectors are not directly related to Bray-Curtis. You fit the vectors to the NMDS ordination, and that is a non-linear mapping from Bray-Curtis to the metric ordination space.  There are two points here: non-linearity and stress. Because of these, it is not strictly about B-C. Of course, the referee is wrong when writing about NMDS axes: the fitted vector has nothing to do with axes (unless you rotate your axis parallel to the fitted vector which you can do). The NMDS is based on Bray-Curtis, but it is not the same, and the vector fitting is based on NMDS. So why not write that is about NMDS? Why to insist on Bray-Curtis which is only in the background?

Cheers, Jari Oksanen
#
I've removed R-Help from this now...
On Thu, 2012-05-10 at 10:13 +0000, Jari Oksanen wrote:
<snip />
<snip />
Right, agreed. The analysis is one step removed from the B-C but the
point of doing the nMDS was to find a low-d mapping of these B-C
distances so in the sense that *if* the mapping is a good one then we
can talk about correlations between "distances" between sites and the
environmental variables. Whilst it might be strictly more correct to
talk about this from the point of view of the nMDS the implication is
that for significant envfit()s there is a significant linear correlation
between the environmental variable(s) and the approximate ranked
distances between samples.

I mean, if all we talk about is the nMDS who cares? it is the
implications of this for the system under study that are of interest.

That said, B-C is just one of many ways to think of distance so to my
mind I wouldn't even talk about the B-C distance either; the interest is
in differences between sites/samples. The relevance of B-C or some other
coefficient only comes in when considering if they are a good descriptor
of the "distance" between samples for the variables you are considering.

Cheers,

G

  
    
#
On Thu, 2012-05-10 at 13:17 +0200, Alan Haynes wrote:
It is perfectly valid and is introduced in Jongman et al alongside PCA
and CA. We (well Jari) wouldn't have written a method for objects of
class "cca" if it wasn't appropriate.

I suggest you look at ordisurf() though; in most of the projects I have
been involved in, the linearity assumption of envfit() is questionable.

If you want a bit more info on what ordisurf() is doing see my blog post
on the function: http://wp.me/pZRQ9-1x

HTH

G

  
    
#
Hi Alan,
I think that PCA is even better with envfit than NMDS with envfit. This
is because PCA works in linear euclidean world, so correlation makes
better sense in this case. You are correlating points on lines (envfit)
with points on lines (PCA), rather than points on lines (envfit) with
undetermined something non-regularly stressed (NMDS).
But this is just my feeling, I may be wrong easily, but in that case I
hope someone will correct me.
Best,
Martin Weiser

Alan Haynes p??e v ?t 10. 05. 2012 v 13:17 +0200:
#
Alan,

Use vegan command 

vegandocs("decision")  

and look at the chapter on scaling of RDA scaling. This explains both the scaling and how to change the scaling.

Cheers, Jari Oksanen