Data transformation prior to RDA
On Tue, 2010-04-20 at 11:48 +1200, Etienne Lalibert? wrote:
Are your variables species abundances, or other types of descriptors? If the former, standardization by column may not be ideal.
I think this needs a little clarification - or a different take on it. Standardising the species (response) data in PCA/RDA results in each species having unit variance and hence contributing an equal amount to the "inertia" measure. This tends to give a more balanced ordination of abundance data. In unstandardised PCA/RDA, abundant species with high variance tend to dominate the resulting ordination. Standardisation is called for when response data are measured in different units (i.e. when not species abundances), but may be desirable for species abundances and in my experience is quite often warranted. G
Transformations such as the Hellinger, as suggested by Michael, were developed for species abundances data (Legendre & Gallagher 2001). There are many ways to transform variables to normalize them, if that's what you're after; see chapter 1 or Legendre & Legendre (1998). The Box-Cox method is possibly the closest thing to what you're asking, i.e. the "best possible transformation for each of the variables". But I'm convinced there are as many opinions on the subject as there are different methods. Cheers Etienne Le lundi 19 avril 2010 ? 20:02 -0300, Devoto Mariano a ?crit :
Dear all, I'm trying to do a redundancy analysis. I'm following Legendre & Legendre's (1998) tips to prepare the data prior to the analysis, and Im hoping to do the analysis using package 'vegan'. I've already centered and standardized my explanatory and response variables, but I'm having trouble at deciding whether or not (and how) data should be transformed "to linearise the relationships and make the distributions more symmetric". Is there a way to find the best possible transformation for each variable but considering at the same time its linearity to the other ones? Please tell me if I'm not even asking the right question here... Heres my dataset. First 3 columns are my response variables. All the others are explanatory. I know this is a rather basic query, but any tips will be greatly appreciated. -0.49350555 -0.37364383 0.70566360 -1.1180986 -1.14255167 -1.30234943 -1.0812858 -0.4910362 -1.02769104 0.21678178 1.11781073 -1.1123319 -0.88277150 -0.80445588 -1.0638291 0.3241891 -0.64335588 -2.07868376 -1.36782590 -1.0585453 -1.02709382 -1.07710897 -0.2760976 1.4695121 0.25799225 0.82044015 1.02481726 -1.1114373 -0.94050043 -1.23089531 -0.7064526 -0.5012921 0.56048832 -0.29655712 -0.07148828 -1.1099933 -1.17141614 1.54301771 -1.0921962 -1.9517655 -0.36443725 -1.49241963 -0.23840793 -1.1180554 -1.14255167 -1.24049362 -1.0856499 -0.6977804 -1.97959936 1.30035099 -1.18114614 1.0885061 -0.59412687 -0.21062037 1.7890870 0.5018224 -0.24966043 -0.66228200 0.69101500 -0.8697510 -0.88277150 -0.83963955 0.1330428 1.3450534 0.24720930 0.35162548 -1.34252630 1.6571129 -0.59412687 -0.13708733 2.0090270 0.7553207 -0.35385550 0.99058254 -1.14295716 -0.6801336 -0.76731365 -0.93148980 1.9120456 1.4084094 -0.92880313 1.14039444 1.38922106 -0.9008538 -0.79617811 -0.96178699 0.6512872 1.2365340 -0.24431565 -0.20947362 0.76084722 -0.8978493 -0.59412687 -0.56565825 -0.4639991 -0.2045137 -0.60428104 1.05108295 -0.68704030 1.1833813 0.41612935 -0.07054391 1.2816664 0.6181682 0.63837128 0.06672464 0.32041910 0.4154816 0.12748471 0.46057549 -0.2488216 0.3867322 0.67144677 0.66889622 1.83857364 0.8375587 0.27180703 0.82551787 -0.2488216 -0.5987399 2.53611774 1.45517653 -0.22337307 0.9253861 0.06975579 -0.22307224 1.6332240 0.5146235 -0.13273765 -0.55628531 0.55154280 -0.2721408 0.99341861 -0.14553291 -0.1669935 0.9976660 -0.02043306 -1.52670601 -2.08967318 1.7138916 2.14799715 2.18006143 -0.6034099 -0.9383742 0.80218610 -0.58481301 0.18945796 0.9761855 1.57070788 1.90295452 -0.6579619 -1.3578423 1.32726744 0.64941495 -0.42596631 0.7975236 0.87796076 0.63986198 -0.0760734 -1.0445683 -1.53219503 0.57349823 1.03668089 0.5040093 1.05114754 0.83815684 -0.3852017 -0.8672218 0.67016035 0.81036993 0.14519361 0.5065215 1.05114754 0.49360195 -0.1124414 -0.7921778 1.53517131 -0.85469204 -0.12003248 0.3702800 1.02228308 0.66797133 -0.3185269 -1.1538661 -0.67154028 -1.45978251 -0.88080583 -0.7266479 0.93568969 0.18901542 -0.8216180 1.0411473 Thanks! Best wishes, Mariano -------------------------- Mariano Devoto School of Biological Sciences University of Bristol Woodland Road Bristol, UK BS8 1UG Tel. +44 (0) 1179545960 (internal 45960) web: http://agro.uba.ar/~mdevoto <http://agro.uba.ar/%7Emdevoto> [[alternative HTML version deleted]]
_______________________________________________ R-sig-ecology mailing list R-sig-ecology at r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~% Dr. Gavin Simpson [t] +44 (0)20 7679 0522 ECRC, UCL Geography, [f] +44 (0)20 7679 0565 Pearson Building, [e] gavin.simpsonATNOSPAMucl.ac.uk Gower Street, London [w] http://www.ucl.ac.uk/~ucfagls/ UK. WC1E 6BT. [w] http://www.freshwaters.org.uk %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%