Skip to content

how to calculate "axis variance" in metaMDS, pakage vegan?

13 messages · Gian Maria Niccolò Benucci, gabriel singer, Maria Dulce Subida +3 more

#
hi gian,

no, there is no such way. A MDS can?t express "explained variance". 
However, the stress value is the overall measure of quality of fit of 
your MDS to the data. There are various measures of stress, but loosely 
speaking you can regard the stress as a percentage of  variation NOT 
explained by ALL dimensions in your MDS.

cheers, g
Gian Maria Niccol? Benucci wrote:
#
On 1/12/09 11:12 AM, "Gian Maria Niccol? Benucci" <gian.benucci at gmail.com>
wrote:
Gian,

Why nobody asks how to calculate the non-linear stress of PCA or RDA?
Non-linear stress would make just as little sense for PCA or RDA as the axis
variance makes for NMDS. The axis variance is purely a measure of PCA family
of methods (even CA family has a bit different measure), and there is no
meaningful way of calculating "expressed variance" for NMDS axes (and
indeed, a separate axis is not meaningful for NMDS: it is the configuration
spanned by all axes together that makes sense).

Cheers, Jari Oksanen
#
gian,

you may try consecutive MDS-analyses with increasing number of 
dimensions (the parameter k in the isoMDS() or metaMDS() function). then 
plot stress against the number of dimensions and judge similar to a 
scree-plot in PCA. this should tell you how many dimensions to use for 
the MDS and as such also an appropriate associate stress-value.

cheers, gabriel
Gian Maria Niccol? Benucci wrote:

  
    
#
Maria Dulce Subida <mdsubida at icman.csic.es>
writes:
I think this is the key point. The stress will always increase with
increasing data, as it is harder to capture the information content of 10
variables on 2 axes than it is to capture the information content of 3
variables on 2 axes. Similarly, stress will always go down as you
increase the number of axes you use, for the same reason. So I'm not
convinced that any of these 'rules of thumb' (i.e., > 0.3 is a 'bad'
ordination) really make much sense. Please correct me if I'm wrong!

Cheers,

Tyler
1 day later
#
On 2/12/09 19:55 PM, "Gian Maria Niccol? Benucci" <gian.benucci at gmail.com>
wrote:
Gian,

This looks very much like badly degenerate solution. You shouldn't use 23
axes in NMDS, in particular with 40 x 20 source data. In Euclidean space
that data would give you rank of 20 or you could find at maximum 20 axes in
metric scaling. In the Bray-Curtis space the situation is more complicated,
but one random data set (Poisson random variates with lambda = 3.14) gave 25
positive and 14 negative eigenvalues. Probably the 23 dimensions you specify
exhaust the real part of your space even in metric scaling, and probably
(and obviously) fail miserably in nonmetric scaling. You shouldn't get
stress of that magnitude with a decent model with data like that.

It has never occurred to me that anybody would like to have NMDS with that
high number of dimensions. Usually we want to use two, sometimes one or two
more, but that's about the limit. Do the same and set k=2 to k=4 at maximum.
If you want to have mapping of all of your real space (i.e., ignore the
complex space), you can use metric scaling. The standard R choice is
cmdscale(). The vegan alternatives are capscale() which also can do
unconstrained metric scaling, returns information both on the real and
imaginary components of your space, and has plot and other support
functions. The low level alternative in vegan is wcmdscale() which also is
used by capscale(), but does not have any support functions (lacks even
print.wcmdscale!)

NMDS is really intended for nonlinear mapping onto *low* number of
dimensions.

Cheers, Jari Oksanen
#
On 2/12/09 19:55 PM, "Gian Maria Niccol? Benucci" <gian.benucci at gmail.com>
wrote:
Gian, Not quite so. I think it would be useful to consult a good book, but
here some explanation.

The NMDS is not a simple "reproduction" method, but it is a non-linear
regression problem. For n points and k dimensions we fit a nonlinear
regression with n*k parameters fitted to n*(n-1)/2 observations. It doesn't
require much intuition to see that this is not well defined for k
approaching n, and then the non-linear regression fails. For details, the
non-linear regression function is isoreg() in R, and the model fitting
happens with optim() using method = "BFGS" (Broyden, Fletcher, Goldfarb &
Shanno). All this is not very obvious because it is done within a C function
in the MASS package. The NMDS is nonlinear just in order to be able to
produce a good mapping with low values of k: so stick with low values of k.

If you want to have complete mapping of dissimilarities, you should use
metric scaling. Then you typically ignore the latter axes. However, even
here the situation is not as clear as you write. If you use Euclidean
distances, then the number of variables give the number of dimensions of
metric scaling. With Euclidean distances, the complete solution also exactly
reproduces the observed distances. However, with non-Euclidean
dissimilarities (like Bray-Curtis in your case) the situation is more
complicated. Metric scaling and complete mapping is Euclidean, and if your
dissimilarities are non-Euclidean, you have a problem (that you usually
ignore). Firstly, the number of above zero eigenvalues and corresponding
real eigenvalues is not directly defined by the number of variables.
Secondly, you cannot reproduce the observed dissimilarities from real
eigenvectors because that reproduction is Euclidean and your measure was
non-Euclidean. For exact reproduction, you should subtract the distances in
imaginary space (negative eigenvalues) from distances in the real space
(positive eigenvalues). We actually do it exactly like this in the
betadisper() function in vegan, and for this reason the wcmdscale() function
of vegan also returns information on complex eigenvectors and negative
eigenvalues.

For your other post that came when I wrote this: stress 11.6 is really fine.
I think that if you get stress down to 5% (0.05) or less, then there is
something fishy in your data or in your model specification, like
overfitting. 

Cheers, Jari Oksanen