how to calculate "axis variance" in metaMDS, pakage vegan?

An embedded and charset-unspecified text was scrubbed...
Name: not available
URL: <https://stat.ethz.ch/pipermail/r-sig-ecology/attachments/20091201/bcf20ed7/attachment.pl>
hi gian,

no, there is no such way. A MDS can?t express "explained variance". 
However, the stress value is the overall measure of quality of fit of 
your MDS to the data. There are various measures of stress, but loosely 
speaking you can regard the stress as a percentage of  variation NOT 
explained by ALL dimensions in your MDS.

cheers, g
Hi Hi there,

I am trying to use funcion metaMDS (vegan pakage) for Community Ecology
data, but I find no way to calculate the "expressed variance" of the first 2
axis? is there a way to do that?
Thanks a lot in advance,

Gian

	[[alternative HTML version deleted]]

_______________________________________________
R-sig-ecology mailing list
R-sig-ecology at r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology

On 1/12/09 11:12 AM, "Gian Maria Niccol? Benucci" <gian.benucci at gmail.com>
wrote:
Hi Hi there,

I am trying to use funcion metaMDS (vegan pakage) for Community Ecology
data, but I find no way to calculate the "expressed variance" of the first 2
axis? is there a way to do that?
Thanks a lot in advance,

Gian,

Why nobody asks how to calculate the non-linear stress of PCA or RDA?
Non-linear stress would make just as little sense for PCA or RDA as the axis
variance makes for NMDS. The axis variance is purely a measure of PCA family
of methods (even CA family has a bit different measure), and there is no
meaningful way of calculating "expressed variance" for NMDS axes (and
indeed, a separate axis is not meaningful for NMDS: it is the configuration
spanned by all axes together that makes sense).

Cheers, Jari Oksanen
gian,

you may try consecutive MDS-analyses with increasing number of 
dimensions (the parameter k in the isoMDS() or metaMDS() function). then 
plot stress against the number of dimensions and judge similar to a 
scree-plot in PCA. this should tell you how many dimensions to use for 
the MDS and as such also an appropriate associate stress-value.

cheers, gabriel
Okey, really many thanks... So having low Stress value is foundamental, as
it is as lower as higher the model fit the data, is that right? How can I
know if my Stress is correct? I mean, if it is enough low to asses that the
model fit good the samples data shifts into the graph...
Is there a treshold or something?
I would appreciate any pdf or kind of reviews on ordination models for
community ecology data... :)
Thank you really much!
Cheers,

Gian

2009/12/1 Gian Maria Niccol? Benucci <gian.benucci at gmail.com>

Hi Hi there,

I am trying to use funcion metaMDS (vegan pakage) for Community Ecology
data, but I find no way to calculate the "expressed variance" of the first 2
axis? is there a way to do that?
Thanks a lot in advance,

Gian

------------------------------------------------------------------------

_______________________________________________
R-sig-ecology mailing list
R-sig-ecology at r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology

Dr. Gabriel Singer
Department of Freshwater Ecology - University of Vienna
and Wassercluster Lunz Biologische Station GmbH
+43-(0)664-1266747
gabriel.singer at univie.ac.at
An embedded and charset-unspecified text was scrubbed...
Name: not available
URL: <https://stat.ethz.ch/pipermail/r-sig-ecology/attachments/20091201/6c43f558/attachment.pl>
An embedded and charset-unspecified text was scrubbed...
Name: not available
URL: <https://stat.ethz.ch/pipermail/r-sig-ecology/attachments/20091201/7a0395ea/attachment.pl>
Maria Dulce Subida <mdsubida at icman.csic.es>
writes:
Nevertheless you should take into account that the stress usually 
increases with increasing quantity of data.
I think this is the key point. The stress will always increase with
increasing data, as it is harder to capture the information content of 10
variables on 2 axes than it is to capture the information content of 3
variables on 2 axes. Similarly, stress will always go down as you
increase the number of axes you use, for the same reason. So I'm not
convinced that any of these 'rules of thumb' (i.e., > 0.3 is a 'bad'
ordination) really make much sense. Please correct me if I'm wrong!

Cheers,

Tyler
An embedded and charset-unspecified text was scrubbed...
Name: not available
URL: <https://stat.ethz.ch/pipermail/r-sig-ecology/attachments/20091201/f4e18e53/attachment.pl>
An embedded and charset-unspecified text was scrubbed...
Name: not available
URL: <https://stat.ethz.ch/pipermail/r-sig-ecology/attachments/20091202/e2015729/attachment.pl>
On 2/12/09 19:55 PM, "Gian Maria Niccol? Benucci" <gian.benucci at gmail.com>
wrote:
Okey, I understood...
I have a matrix of 40 rows (samples) and 29 columns (species). In the
ordination graph the data divide in two clades ( as i supposed they must
to)... and that's my best solution for reduce the Stress...

metaMDS(sqrtABCD, distance = "bray", k = 23, trymax = 50, autotransform
=F) -> NMS.trial
Gian,

This looks very much like badly degenerate solution. You shouldn't use 23
axes in NMDS, in particular with 40 x 20 source data. In Euclidean space
that data would give you rank of 20 or you could find at maximum 20 axes in
metric scaling. In the Bray-Curtis space the situation is more complicated,
but one random data set (Poisson random variates with lambda = 3.14) gave 25
positive and 14 negative eigenvalues. Probably the 23 dimensions you specify
exhaust the real part of your space even in metric scaling, and probably
(and obviously) fail miserably in nonmetric scaling. You shouldn't get
stress of that magnitude with a decent model with data like that.

It has never occurred to me that anybody would like to have NMDS with that
high number of dimensions. Usually we want to use two, sometimes one or two
more, but that's about the limit. Do the same and set k=2 to k=4 at maximum.
If you want to have mapping of all of your real space (i.e., ignore the
complex space), you can use metric scaling. The standard R choice is
cmdscale(). The vegan alternatives are capscale() which also can do
unconstrained metric scaling, returns information both on the real and
imaginary components of your space, and has plot and other support
functions. The low level alternative in vegan is wcmdscale() which also is
used by capscale(), but does not have any support functions (lacks even
print.wcmdscale!)

NMDS is really intended for nonlinear mapping onto *low* number of
dimensions.

Cheers, Jari Oksanen
NMS.trial
Call:
metaMDS(comm = sqrtABCD, distance = "bray", k = 23, trymax = 100,
autotransform = F)

Nonmetric Multidimensional Scaling using isoMDS (MASS package)

Data:     sqrtABCD
Distance: bray shortest

Dimensions: 23
Stress:     0.2548688
Two convergent solutions found after 8 tries
Scaling: centring, PC rotation, halfchange scaling
Species: expanded scores based on ?sqrtABCD?

With more than 23 dimensions R gave me that result:

metaMDS(sqrtABCD, distance = "bray", k = 30, trymax = 50,
Using step-across dissimilarities:
Too long or NA distances: 230 out of 780 (29.5%)
Stepping across 780 dissimilarities...
Errore in isoMDS(dist, k = k, trace = isotrace) :
  initial configuration must be complete
Inoltre: Warning messages:
1: In cmdscale(d, k) : some of the first 30 eigenvalues are < 0
2: In sqrt(ev) : Si ? prodotto un NaN

...Is normal I got better ordination (sepatation of different samples, that
I know they're different) with few dimension also if the Stress is high?

... I supposed, that If we use as many dimensions as there are variables,
then we can perfectly reproduce the observed distance matrix. Isn't it? But,
of course, our goal is to reduce the observed complexity of nature, that is,
to explain the distance matrix in terms of fewer underlying dimensions...
So what is best at the end??
And also wich is the function for plotting the stress values versus the
number of dimnsions and how to read the plot?
I hope I was clear, thank you so much!
Yours,

G.

[[alternative HTML version deleted]]

_______________________________________________
R-sig-ecology mailing list
R-sig-ecology at r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
An embedded and charset-unspecified text was scrubbed...
Name: not available
URL: <https://stat.ethz.ch/pipermail/r-sig-ecology/attachments/20091202/312328d2/attachment.pl>
On 2/12/09 19:55 PM, "Gian Maria Niccol? Benucci" <gian.benucci at gmail.com>
wrote:
... I supposed, that If we use as many dimensions as there are variables,
then we can perfectly reproduce the observed distance matrix. Isn't it?
Gian, Not quite so. I think it would be useful to consult a good book, but
here some explanation.

The NMDS is not a simple "reproduction" method, but it is a non-linear
regression problem. For n points and k dimensions we fit a nonlinear
regression with n*k parameters fitted to n*(n-1)/2 observations. It doesn't
require much intuition to see that this is not well defined for k
approaching n, and then the non-linear regression fails. For details, the
non-linear regression function is isoreg() in R, and the model fitting
happens with optim() using method = "BFGS" (Broyden, Fletcher, Goldfarb &
Shanno). All this is not very obvious because it is done within a C function
in the MASS package. The NMDS is nonlinear just in order to be able to
produce a good mapping with low values of k: so stick with low values of k.

If you want to have complete mapping of dissimilarities, you should use
metric scaling. Then you typically ignore the latter axes. However, even
here the situation is not as clear as you write. If you use Euclidean
distances, then the number of variables give the number of dimensions of
metric scaling. With Euclidean distances, the complete solution also exactly
reproduces the observed distances. However, with non-Euclidean
dissimilarities (like Bray-Curtis in your case) the situation is more
complicated. Metric scaling and complete mapping is Euclidean, and if your
dissimilarities are non-Euclidean, you have a problem (that you usually
ignore). Firstly, the number of above zero eigenvalues and corresponding
real eigenvalues is not directly defined by the number of variables.
Secondly, you cannot reproduce the observed dissimilarities from real
eigenvectors because that reproduction is Euclidean and your measure was
non-Euclidean. For exact reproduction, you should subtract the distances in
imaginary space (negative eigenvalues) from distances in the real space
(positive eigenvalues). We actually do it exactly like this in the
betadisper() function in vegan, and for this reason the wcmdscale() function
of vegan also returns information on complex eigenvectors and negative
eigenvalues.

For your other post that came when I wrote this: stress 11.6 is really fine.
I think that if you get stress down to 5% (0.05) or less, then there is
something fishy in your data or in your model specification, like
overfitting. 

Cheers, Jari Oksanen
But,
of course, our goal is to reduce the observed complexity of nature, that is,
to explain the distance matrix in terms of fewer underlying dimensions...
So what is best at the end??
And also wich is the function for plotting the stress values versus the
number of dimnsions and how to read the plot?
I hope I was clear, thank you so much!
Yours,

G.

[[alternative HTML version deleted]]

_______________________________________________
R-sig-ecology mailing list
R-sig-ecology at r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
An embedded and charset-unspecified text was scrubbed...
Name: not available
URL: <https://stat.ethz.ch/pipermail/r-sig-ecology/attachments/20091202/5a2930ab/attachment.pl>