An embedded and charset-unspecified text was scrubbed... Name: not available URL: <https://stat.ethz.ch/pipermail/r-sig-ecology/attachments/20100419/f653d04d/attachment.pl>
Data transformation prior to RDA
7 messages · Mariano Devoto, Michael Denslow, Etienne Laliberté +2 more
Hi Mariano,
Dear all, I'm trying to do a redundancy analysis. I'm following Legendre & Legendre's (1998) tips to prepare the data prior to the analysis, and I? hoping to do the analysis using package 'vegan'. I've already centered and standardized my explanatory and response variables, but I'm having trouble at deciding whether or not (and how) data should be transformed "to linearise the relationships and make the distributions more symmetric". Is there a way to find the best possible transformation for each variable but considering at the same time its linearity to the other ones? Please tell me if I'm not even asking the right question here...
Have a look at the vegan function ?decostand with method = 'hellinger'. I believe that it is discussed and recommended in: Legendre, P. & Gallagher, E.D. (2001) Ecologically meaningful transformations for ordination of species data. Oecologia 129; 271?280. Hope this helps, Michael
Here? my dataset. First 3 columns are my response variables. All the others are explanatory. I know this is a rather basic query, but any tips will be greatly appreciated. ?-0.49350555 -0.37364383 ?0.70566360 -1.1180986 -1.14255167 -1.30234943 -1.0812858 -0.4910362 -1.02769104 ?0.21678178 ?1.11781073 -1.1123319 -0.88277150 -0.80445588 -1.0638291 ?0.3241891 -0.64335588 -2.07868376 -1.36782590 -1.0585453 -1.02709382 -1.07710897 -0.2760976 ?1.4695121 0.25799225 ?0.82044015 ?1.02481726 -1.1114373 -0.94050043 -1.23089531 -0.7064526 -0.5012921 0.56048832 -0.29655712 -0.07148828 -1.1099933 -1.17141614 ?1.54301771 -1.0921962 -1.9517655 -0.36443725 -1.49241963 -0.23840793 -1.1180554 -1.14255167 -1.24049362 -1.0856499 -0.6977804 -1.97959936 ?1.30035099 -1.18114614 ?1.0885061 -0.59412687 -0.21062037 1.7890870 ?0.5018224 -0.24966043 -0.66228200 ?0.69101500 -0.8697510 -0.88277150 -0.83963955 0.1330428 ?1.3450534 0.24720930 ?0.35162548 -1.34252630 ?1.6571129 -0.59412687 -0.13708733 2.0090270 ?0.7553207 -0.35385550 ?0.99058254 -1.14295716 -0.6801336 -0.76731365 -0.93148980 1.9120456 ?1.4084094 -0.92880313 ?1.14039444 ?1.38922106 -0.9008538 -0.79617811 -0.96178699 0.6512872 ?1.2365340 -0.24431565 -0.20947362 ?0.76084722 -0.8978493 -0.59412687 -0.56565825 -0.4639991 -0.2045137 -0.60428104 ?1.05108295 -0.68704030 ?1.1833813 ?0.41612935 -0.07054391 1.2816664 ?0.6181682 0.63837128 ?0.06672464 ?0.32041910 ?0.4154816 ?0.12748471 ?0.46057549 -0.2488216 ?0.3867322 0.67144677 ?0.66889622 ?1.83857364 ?0.8375587 ?0.27180703 ?0.82551787 -0.2488216 -0.5987399 2.53611774 ?1.45517653 -0.22337307 ?0.9253861 ?0.06975579 -0.22307224 1.6332240 ?0.5146235 -0.13273765 -0.55628531 ?0.55154280 -0.2721408 ?0.99341861 -0.14553291 -0.1669935 ?0.9976660 -0.02043306 -1.52670601 -2.08967318 ?1.7138916 ?2.14799715 ?2.18006143 -0.6034099 -0.9383742 0.80218610 -0.58481301 ?0.18945796 ?0.9761855 ?1.57070788 ?1.90295452 -0.6579619 -1.3578423 1.32726744 ?0.64941495 -0.42596631 ?0.7975236 ?0.87796076 ?0.63986198 -0.0760734 -1.0445683 -1.53219503 ?0.57349823 ?1.03668089 ?0.5040093 ?1.05114754 ?0.83815684 -0.3852017 -0.8672218 0.67016035 ?0.81036993 ?0.14519361 ?0.5065215 ?1.05114754 ?0.49360195 -0.1124414 -0.7921778 1.53517131 -0.85469204 -0.12003248 ?0.3702800 ?1.02228308 ?0.66797133 -0.3185269 -1.1538661 -0.67154028 -1.45978251 -0.88080583 -0.7266479 ?0.93568969 ?0.18901542 -0.8216180 ?1.0411473
Thanks! Best wishes, Mariano -------------------------- Mariano Devoto School of Biological Sciences University of Bristol Woodland Road Bristol, UK BS8 1UG Tel. +44 (0) 1179545960 (internal 45960) web: http://agro.uba.ar/~mdevoto <http://agro.uba.ar/%7Emdevoto>
Michael Denslow I.W. Carpenter Jr. Herbarium [BOON] Department of Biology Appalachian State University Boone, North Carolina U.S.A. -- AND -- Communications Manager Southeast Regional Network of Expertise and Collections sernec.org 36.214177, -81.681480 +/- 3103 meters
Are your variables species abundances, or other types of descriptors? If the former, standardization by column may not be ideal. Transformations such as the Hellinger, as suggested by Michael, were developed for species abundances data (Legendre & Gallagher 2001). There are many ways to transform variables to normalize them, if that's what you're after; see chapter 1 or Legendre & Legendre (1998). The Box-Cox method is possibly the closest thing to what you're asking, i.e. the "best possible transformation for each of the variables". But I'm convinced there are as many opinions on the subject as there are different methods. Cheers Etienne Le lundi 19 avril 2010 ? 20:02 -0300, Devoto Mariano a ?crit :
Dear all, I'm trying to do a redundancy analysis. I'm following Legendre & Legendre's (1998) tips to prepare the data prior to the analysis, and Im hoping to do the analysis using package 'vegan'. I've already centered and standardized my explanatory and response variables, but I'm having trouble at deciding whether or not (and how) data should be transformed "to linearise the relationships and make the distributions more symmetric". Is there a way to find the best possible transformation for each variable but considering at the same time its linearity to the other ones? Please tell me if I'm not even asking the right question here... Heres my dataset. First 3 columns are my response variables. All the others are explanatory. I know this is a rather basic query, but any tips will be greatly appreciated. -0.49350555 -0.37364383 0.70566360 -1.1180986 -1.14255167 -1.30234943 -1.0812858 -0.4910362 -1.02769104 0.21678178 1.11781073 -1.1123319 -0.88277150 -0.80445588 -1.0638291 0.3241891 -0.64335588 -2.07868376 -1.36782590 -1.0585453 -1.02709382 -1.07710897 -0.2760976 1.4695121 0.25799225 0.82044015 1.02481726 -1.1114373 -0.94050043 -1.23089531 -0.7064526 -0.5012921 0.56048832 -0.29655712 -0.07148828 -1.1099933 -1.17141614 1.54301771 -1.0921962 -1.9517655 -0.36443725 -1.49241963 -0.23840793 -1.1180554 -1.14255167 -1.24049362 -1.0856499 -0.6977804 -1.97959936 1.30035099 -1.18114614 1.0885061 -0.59412687 -0.21062037 1.7890870 0.5018224 -0.24966043 -0.66228200 0.69101500 -0.8697510 -0.88277150 -0.83963955 0.1330428 1.3450534 0.24720930 0.35162548 -1.34252630 1.6571129 -0.59412687 -0.13708733 2.0090270 0.7553207 -0.35385550 0.99058254 -1.14295716 -0.6801336 -0.76731365 -0.93148980 1.9120456 1.4084094 -0.92880313 1.14039444 1.38922106 -0.9008538 -0.79617811 -0.96178699 0.6512872 1.2365340 -0.24431565 -0.20947362 0.76084722 -0.8978493 -0.59412687 -0.56565825 -0.4639991 -0.2045137 -0.60428104 1.05108295 -0.68704030 1.1833813 0.41612935 -0.07054391 1.2816664 0.6181682 0.63837128 0.06672464 0.32041910 0.4154816 0.12748471 0.46057549 -0.2488216 0.3867322 0.67144677 0.66889622 1.83857364 0.8375587 0.27180703 0.82551787 -0.2488216 -0.5987399 2.53611774 1.45517653 -0.22337307 0.9253861 0.06975579 -0.22307224 1.6332240 0.5146235 -0.13273765 -0.55628531 0.55154280 -0.2721408 0.99341861 -0.14553291 -0.1669935 0.9976660 -0.02043306 -1.52670601 -2.08967318 1.7138916 2.14799715 2.18006143 -0.6034099 -0.9383742 0.80218610 -0.58481301 0.18945796 0.9761855 1.57070788 1.90295452 -0.6579619 -1.3578423 1.32726744 0.64941495 -0.42596631 0.7975236 0.87796076 0.63986198 -0.0760734 -1.0445683 -1.53219503 0.57349823 1.03668089 0.5040093 1.05114754 0.83815684 -0.3852017 -0.8672218 0.67016035 0.81036993 0.14519361 0.5065215 1.05114754 0.49360195 -0.1124414 -0.7921778 1.53517131 -0.85469204 -0.12003248 0.3702800 1.02228308 0.66797133 -0.3185269 -1.1538661 -0.67154028 -1.45978251 -0.88080583 -0.7266479 0.93568969 0.18901542 -0.8216180 1.0411473 Thanks! Best wishes, Mariano -------------------------- Mariano Devoto School of Biological Sciences University of Bristol Woodland Road Bristol, UK BS8 1UG Tel. +44 (0) 1179545960 (internal 45960) web: http://agro.uba.ar/~mdevoto <http://agro.uba.ar/%7Emdevoto> [[alternative HTML version deleted]]
_______________________________________________ R-sig-ecology mailing list R-sig-ecology at r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
Etienne Lalibert? ================================ School of Forestry University of Canterbury Private Bag 4800 Christchurch 8140, New Zealand Phone: +64 3 366 7001 ext. 8365 Fax: +64 3 364 2124 www.elaliberte.info
Dear Devoto Mariano,
On 20/04/10 02:02 AM, "Devoto Mariano" <mdevoto at agro.uba.ar> wrote:
Dear all, I'm trying to do a redundancy analysis. I'm following Legendre & Legendre's (1998) tips to prepare the data prior to the analysis, and I?m hoping to do the analysis using package 'vegan'. I've already centered and standardized my explanatory and response variables,
You do not need to do this in vegan. Vegan uses methods that cope nicely with non-centred constraints in original scale. Pierre Legendre explains the "projection matrix" method where centring is necessary and standardization useful, but vegan uses different methods (QR decomposition).
but I'm having trouble at deciding whether or not (and how) data should be transformed "to linearise the relationships and make the distributions more symmetric". Is there a way to find the best possible transformation for each variable but considering at the same time its linearity to the other ones? Please tell me if I'm not even asking the right question here...
This is a difficult question, and there is no easy answer. RDA is basically a linear method and linear combination scores (LC scores) indeed are linear combinations of constraints. Nonlinear transformation will change the LC scores and hence the ordination. Selecting an optimal transformation for multivariate explanatory variables (constraints) for multivariate response (species) is a tricky thing, and people usually do not try to do this. I have no idea how to do this. For instance, I have no idea what would be a criterion of "good" model -- the only thing I'm sure is that goodness of fit (eigenvalue) is not a good criterion. What you may be do is to inspect the constraints by pairs() plots, and see if there are some strange distribution patterns in pairwise panels. It is a completely different question than having a good linear relationship between your joint constraints simultaneously to all species simultaneously, though. If you intended to ask about transformation of species data, read the other answers. Cheers, Jari Oksanen
On Tue, 2010-04-20 at 11:48 +1200, Etienne Lalibert? wrote:
Are your variables species abundances, or other types of descriptors? If the former, standardization by column may not be ideal.
I think this needs a little clarification - or a different take on it. Standardising the species (response) data in PCA/RDA results in each species having unit variance and hence contributing an equal amount to the "inertia" measure. This tends to give a more balanced ordination of abundance data. In unstandardised PCA/RDA, abundant species with high variance tend to dominate the resulting ordination. Standardisation is called for when response data are measured in different units (i.e. when not species abundances), but may be desirable for species abundances and in my experience is quite often warranted. G
Transformations such as the Hellinger, as suggested by Michael, were developed for species abundances data (Legendre & Gallagher 2001). There are many ways to transform variables to normalize them, if that's what you're after; see chapter 1 or Legendre & Legendre (1998). The Box-Cox method is possibly the closest thing to what you're asking, i.e. the "best possible transformation for each of the variables". But I'm convinced there are as many opinions on the subject as there are different methods. Cheers Etienne Le lundi 19 avril 2010 ? 20:02 -0300, Devoto Mariano a ?crit :
Dear all, I'm trying to do a redundancy analysis. I'm following Legendre & Legendre's (1998) tips to prepare the data prior to the analysis, and Im hoping to do the analysis using package 'vegan'. I've already centered and standardized my explanatory and response variables, but I'm having trouble at deciding whether or not (and how) data should be transformed "to linearise the relationships and make the distributions more symmetric". Is there a way to find the best possible transformation for each variable but considering at the same time its linearity to the other ones? Please tell me if I'm not even asking the right question here... Heres my dataset. First 3 columns are my response variables. All the others are explanatory. I know this is a rather basic query, but any tips will be greatly appreciated. -0.49350555 -0.37364383 0.70566360 -1.1180986 -1.14255167 -1.30234943 -1.0812858 -0.4910362 -1.02769104 0.21678178 1.11781073 -1.1123319 -0.88277150 -0.80445588 -1.0638291 0.3241891 -0.64335588 -2.07868376 -1.36782590 -1.0585453 -1.02709382 -1.07710897 -0.2760976 1.4695121 0.25799225 0.82044015 1.02481726 -1.1114373 -0.94050043 -1.23089531 -0.7064526 -0.5012921 0.56048832 -0.29655712 -0.07148828 -1.1099933 -1.17141614 1.54301771 -1.0921962 -1.9517655 -0.36443725 -1.49241963 -0.23840793 -1.1180554 -1.14255167 -1.24049362 -1.0856499 -0.6977804 -1.97959936 1.30035099 -1.18114614 1.0885061 -0.59412687 -0.21062037 1.7890870 0.5018224 -0.24966043 -0.66228200 0.69101500 -0.8697510 -0.88277150 -0.83963955 0.1330428 1.3450534 0.24720930 0.35162548 -1.34252630 1.6571129 -0.59412687 -0.13708733 2.0090270 0.7553207 -0.35385550 0.99058254 -1.14295716 -0.6801336 -0.76731365 -0.93148980 1.9120456 1.4084094 -0.92880313 1.14039444 1.38922106 -0.9008538 -0.79617811 -0.96178699 0.6512872 1.2365340 -0.24431565 -0.20947362 0.76084722 -0.8978493 -0.59412687 -0.56565825 -0.4639991 -0.2045137 -0.60428104 1.05108295 -0.68704030 1.1833813 0.41612935 -0.07054391 1.2816664 0.6181682 0.63837128 0.06672464 0.32041910 0.4154816 0.12748471 0.46057549 -0.2488216 0.3867322 0.67144677 0.66889622 1.83857364 0.8375587 0.27180703 0.82551787 -0.2488216 -0.5987399 2.53611774 1.45517653 -0.22337307 0.9253861 0.06975579 -0.22307224 1.6332240 0.5146235 -0.13273765 -0.55628531 0.55154280 -0.2721408 0.99341861 -0.14553291 -0.1669935 0.9976660 -0.02043306 -1.52670601 -2.08967318 1.7138916 2.14799715 2.18006143 -0.6034099 -0.9383742 0.80218610 -0.58481301 0.18945796 0.9761855 1.57070788 1.90295452 -0.6579619 -1.3578423 1.32726744 0.64941495 -0.42596631 0.7975236 0.87796076 0.63986198 -0.0760734 -1.0445683 -1.53219503 0.57349823 1.03668089 0.5040093 1.05114754 0.83815684 -0.3852017 -0.8672218 0.67016035 0.81036993 0.14519361 0.5065215 1.05114754 0.49360195 -0.1124414 -0.7921778 1.53517131 -0.85469204 -0.12003248 0.3702800 1.02228308 0.66797133 -0.3185269 -1.1538661 -0.67154028 -1.45978251 -0.88080583 -0.7266479 0.93568969 0.18901542 -0.8216180 1.0411473 Thanks! Best wishes, Mariano -------------------------- Mariano Devoto School of Biological Sciences University of Bristol Woodland Road Bristol, UK BS8 1UG Tel. +44 (0) 1179545960 (internal 45960) web: http://agro.uba.ar/~mdevoto <http://agro.uba.ar/%7Emdevoto> [[alternative HTML version deleted]]
_______________________________________________ R-sig-ecology mailing list R-sig-ecology at r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~% Dr. Gavin Simpson [t] +44 (0)20 7679 0522 ECRC, UCL Geography, [f] +44 (0)20 7679 0565 Pearson Building, [e] gavin.simpsonATNOSPAMucl.ac.uk Gower Street, London [w] http://www.ucl.ac.uk/~ucfagls/ UK. WC1E 6BT. [w] http://www.freshwaters.org.uk %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
1 day later
An embedded and charset-unspecified text was scrubbed... Name: not available URL: <https://stat.ethz.ch/pipermail/r-sig-ecology/attachments/20100421/b91169ff/attachment.pl>
On 22/04/10 02:21 AM, "Devoto Mariano" <mdevoto at agro.uba.ar> wrote:
A related question by the way, is it possible in vegan to perform a forward selection of variables in the context of a redundancy analysis just like Canoco does?
Dear Devoto Mariano, Not "just like Canoco does", but there is an improved way. Check ordistep() for Canoco-style analysis, and add1.cca, drop1.cca for another ways. Cheers, Jari Oksanen