An embedded and charset-unspecified text was scrubbed... Name: not available URL: <https://stat.ethz.ch/pipermail/r-sig-ecology/attachments/20091201/bcf20ed7/attachment.pl>
how to calculate "axis variance" in metaMDS, pakage vegan?
13 messages · Gian Maria Niccolò Benucci, gabriel singer, Maria Dulce Subida +3 more
hi gian, no, there is no such way. A MDS can?t express "explained variance". However, the stress value is the overall measure of quality of fit of your MDS to the data. There are various measures of stress, but loosely speaking you can regard the stress as a percentage of variation NOT explained by ALL dimensions in your MDS. cheers, g
Gian Maria Niccol? Benucci wrote:
Hi Hi there, I am trying to use funcion metaMDS (vegan pakage) for Community Ecology data, but I find no way to calculate the "expressed variance" of the first 2 axis? is there a way to do that? Thanks a lot in advance, Gian [[alternative HTML version deleted]]
_______________________________________________ R-sig-ecology mailing list R-sig-ecology at r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
On 1/12/09 11:12 AM, "Gian Maria Niccol? Benucci" <gian.benucci at gmail.com> wrote:
Hi Hi there, I am trying to use funcion metaMDS (vegan pakage) for Community Ecology data, but I find no way to calculate the "expressed variance" of the first 2 axis? is there a way to do that? Thanks a lot in advance,
Gian, Why nobody asks how to calculate the non-linear stress of PCA or RDA? Non-linear stress would make just as little sense for PCA or RDA as the axis variance makes for NMDS. The axis variance is purely a measure of PCA family of methods (even CA family has a bit different measure), and there is no meaningful way of calculating "expressed variance" for NMDS axes (and indeed, a separate axis is not meaningful for NMDS: it is the configuration spanned by all axes together that makes sense). Cheers, Jari Oksanen
gian, you may try consecutive MDS-analyses with increasing number of dimensions (the parameter k in the isoMDS() or metaMDS() function). then plot stress against the number of dimensions and judge similar to a scree-plot in PCA. this should tell you how many dimensions to use for the MDS and as such also an appropriate associate stress-value. cheers, gabriel
Gian Maria Niccol? Benucci wrote:
Okey, really many thanks... So having low Stress value is foundamental, as it is as lower as higher the model fit the data, is that right? How can I know if my Stress is correct? I mean, if it is enough low to asses that the model fit good the samples data shifts into the graph... Is there a treshold or something? I would appreciate any pdf or kind of reviews on ordination models for community ecology data... :) Thank you really much! Cheers, Gian 2009/12/1 Gian Maria Niccol? Benucci <gian.benucci at gmail.com>
Hi Hi there,
I am trying to use funcion metaMDS (vegan pakage) for Community Ecology
data, but I find no way to calculate the "expressed variance" of the first 2
axis? is there a way to do that?
Thanks a lot in advance,
Gian
------------------------------------------------------------------------
_______________________________________________ R-sig-ecology mailing list R-sig-ecology at r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
Dr. Gabriel Singer Department of Freshwater Ecology - University of Vienna and Wassercluster Lunz Biologische Station GmbH +43-(0)664-1266747 gabriel.singer at univie.ac.at
An embedded and charset-unspecified text was scrubbed... Name: not available URL: <https://stat.ethz.ch/pipermail/r-sig-ecology/attachments/20091201/6c43f558/attachment.pl>
An embedded and charset-unspecified text was scrubbed... Name: not available URL: <https://stat.ethz.ch/pipermail/r-sig-ecology/attachments/20091201/7a0395ea/attachment.pl>
Maria Dulce Subida <mdsubida at icman.csic.es> writes:
Nevertheless you should take into account that the stress usually increases with increasing quantity of data.
I think this is the key point. The stress will always increase with increasing data, as it is harder to capture the information content of 10 variables on 2 axes than it is to capture the information content of 3 variables on 2 axes. Similarly, stress will always go down as you increase the number of axes you use, for the same reason. So I'm not convinced that any of these 'rules of thumb' (i.e., > 0.3 is a 'bad' ordination) really make much sense. Please correct me if I'm wrong! Cheers, Tyler
An embedded and charset-unspecified text was scrubbed... Name: not available URL: <https://stat.ethz.ch/pipermail/r-sig-ecology/attachments/20091201/f4e18e53/attachment.pl>
1 day later
An embedded and charset-unspecified text was scrubbed... Name: not available URL: <https://stat.ethz.ch/pipermail/r-sig-ecology/attachments/20091202/e2015729/attachment.pl>
On 2/12/09 19:55 PM, "Gian Maria Niccol? Benucci" <gian.benucci at gmail.com> wrote:
Okey, I understood... I have a matrix of 40 rows (samples) and 29 columns (species). In the ordination graph the data divide in two clades ( as i supposed they must to)... and that's my best solution for reduce the Stress...
metaMDS(sqrtABCD, distance = "bray", k = 23, trymax = 50, autotransform
=F) -> NMS.trial
Gian, This looks very much like badly degenerate solution. You shouldn't use 23 axes in NMDS, in particular with 40 x 20 source data. In Euclidean space that data would give you rank of 20 or you could find at maximum 20 axes in metric scaling. In the Bray-Curtis space the situation is more complicated, but one random data set (Poisson random variates with lambda = 3.14) gave 25 positive and 14 negative eigenvalues. Probably the 23 dimensions you specify exhaust the real part of your space even in metric scaling, and probably (and obviously) fail miserably in nonmetric scaling. You shouldn't get stress of that magnitude with a decent model with data like that. It has never occurred to me that anybody would like to have NMDS with that high number of dimensions. Usually we want to use two, sometimes one or two more, but that's about the limit. Do the same and set k=2 to k=4 at maximum. If you want to have mapping of all of your real space (i.e., ignore the complex space), you can use metric scaling. The standard R choice is cmdscale(). The vegan alternatives are capscale() which also can do unconstrained metric scaling, returns information both on the real and imaginary components of your space, and has plot and other support functions. The low level alternative in vegan is wcmdscale() which also is used by capscale(), but does not have any support functions (lacks even print.wcmdscale!) NMDS is really intended for nonlinear mapping onto *low* number of dimensions. Cheers, Jari Oksanen
NMS.trial
Call: metaMDS(comm = sqrtABCD, distance = "bray", k = 23, trymax = 100, autotransform = F) Nonmetric Multidimensional Scaling using isoMDS (MASS package) Data: sqrtABCD Distance: bray shortest Dimensions: 23 Stress: 0.2548688 Two convergent solutions found after 8 tries Scaling: centring, PC rotation, halfchange scaling Species: expanded scores based on ?sqrtABCD? With more than 23 dimensions R gave me that result:
metaMDS(sqrtABCD, distance = "bray", k = 30, trymax = 50,
Using step-across dissimilarities: Too long or NA distances: 230 out of 780 (29.5%) Stepping across 780 dissimilarities... Errore in isoMDS(dist, k = k, trace = isotrace) : initial configuration must be complete Inoltre: Warning messages: 1: In cmdscale(d, k) : some of the first 30 eigenvalues are < 0 2: In sqrt(ev) : Si ? prodotto un NaN
...Is normal I got better ordination (sepatation of different samples, that I know they're different) with few dimension also if the Stress is high? ... I supposed, that If we use as many dimensions as there are variables, then we can perfectly reproduce the observed distance matrix. Isn't it? But, of course, our goal is to reduce the observed complexity of nature, that is, to explain the distance matrix in terms of fewer underlying dimensions... So what is best at the end?? And also wich is the function for plotting the stress values versus the number of dimnsions and how to read the plot? I hope I was clear, thank you so much! Yours, G. [[alternative HTML version deleted]]
_______________________________________________ R-sig-ecology mailing list R-sig-ecology at r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
An embedded and charset-unspecified text was scrubbed... Name: not available URL: <https://stat.ethz.ch/pipermail/r-sig-ecology/attachments/20091202/312328d2/attachment.pl>
On 2/12/09 19:55 PM, "Gian Maria Niccol? Benucci" <gian.benucci at gmail.com> wrote:
... I supposed, that If we use as many dimensions as there are variables, then we can perfectly reproduce the observed distance matrix. Isn't it?
Gian, Not quite so. I think it would be useful to consult a good book, but here some explanation. The NMDS is not a simple "reproduction" method, but it is a non-linear regression problem. For n points and k dimensions we fit a nonlinear regression with n*k parameters fitted to n*(n-1)/2 observations. It doesn't require much intuition to see that this is not well defined for k approaching n, and then the non-linear regression fails. For details, the non-linear regression function is isoreg() in R, and the model fitting happens with optim() using method = "BFGS" (Broyden, Fletcher, Goldfarb & Shanno). All this is not very obvious because it is done within a C function in the MASS package. The NMDS is nonlinear just in order to be able to produce a good mapping with low values of k: so stick with low values of k. If you want to have complete mapping of dissimilarities, you should use metric scaling. Then you typically ignore the latter axes. However, even here the situation is not as clear as you write. If you use Euclidean distances, then the number of variables give the number of dimensions of metric scaling. With Euclidean distances, the complete solution also exactly reproduces the observed distances. However, with non-Euclidean dissimilarities (like Bray-Curtis in your case) the situation is more complicated. Metric scaling and complete mapping is Euclidean, and if your dissimilarities are non-Euclidean, you have a problem (that you usually ignore). Firstly, the number of above zero eigenvalues and corresponding real eigenvalues is not directly defined by the number of variables. Secondly, you cannot reproduce the observed dissimilarities from real eigenvectors because that reproduction is Euclidean and your measure was non-Euclidean. For exact reproduction, you should subtract the distances in imaginary space (negative eigenvalues) from distances in the real space (positive eigenvalues). We actually do it exactly like this in the betadisper() function in vegan, and for this reason the wcmdscale() function of vegan also returns information on complex eigenvectors and negative eigenvalues. For your other post that came when I wrote this: stress 11.6 is really fine. I think that if you get stress down to 5% (0.05) or less, then there is something fishy in your data or in your model specification, like overfitting. Cheers, Jari Oksanen
But, of course, our goal is to reduce the observed complexity of nature, that is, to explain the distance matrix in terms of fewer underlying dimensions... So what is best at the end?? And also wich is the function for plotting the stress values versus the number of dimnsions and how to read the plot? I hope I was clear, thank you so much! Yours, G. [[alternative HTML version deleted]]
_______________________________________________ R-sig-ecology mailing list R-sig-ecology at r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
An embedded and charset-unspecified text was scrubbed... Name: not available URL: <https://stat.ethz.ch/pipermail/r-sig-ecology/attachments/20091202/5a2930ab/attachment.pl>