Data transformation prior to RDA

An embedded and charset-unspecified text was scrubbed...
Name: not available
URL: <https://stat.ethz.ch/pipermail/r-sig-ecology/attachments/20100419/f653d04d/attachment.pl>
Hi Mariano,
Dear all,
I'm trying to do a redundancy analysis. I'm following Legendre & Legendre's
(1998) tips to prepare the data prior to the analysis, and I? hoping to do
the analysis using package 'vegan'.
I've already centered and standardized my explanatory and response
variables, but I'm having trouble at deciding whether or not (and how) data
should be transformed "to linearise the relationships and make the
distributions more symmetric". Is there a way to find the best possible
transformation for each variable but considering at the same time its
linearity to the other ones? Please tell me if I'm not even asking the right
question here...
Have a look at the vegan function ?decostand with method =
'hellinger'. I believe that it is discussed and recommended in:

Legendre, P. & Gallagher, E.D. (2001) Ecologically meaningful
transformations for ordination of species data. Oecologia 129;
271?280.

Hope this helps,

Michael
Here? my dataset. First 3 columns are my response variables. All the others
are explanatory. I know this is a rather basic query, but any tips will be
greatly appreciated.

?-0.49350555 -0.37364383 ?0.70566360 -1.1180986 -1.14255167 -1.30234943
-1.0812858 -0.4910362
-1.02769104 ?0.21678178 ?1.11781073 -1.1123319 -0.88277150 -0.80445588
-1.0638291 ?0.3241891
-0.64335588 -2.07868376 -1.36782590 -1.0585453 -1.02709382 -1.07710897
-0.2760976 ?1.4695121
0.25799225 ?0.82044015 ?1.02481726 -1.1114373 -0.94050043 -1.23089531
-0.7064526 -0.5012921
0.56048832 -0.29655712 -0.07148828 -1.1099933 -1.17141614 ?1.54301771
-1.0921962 -1.9517655
-0.36443725 -1.49241963 -0.23840793 -1.1180554 -1.14255167 -1.24049362
-1.0856499 -0.6977804
-1.97959936 ?1.30035099 -1.18114614 ?1.0885061 -0.59412687 -0.21062037
1.7890870 ?0.5018224
-0.24966043 -0.66228200 ?0.69101500 -0.8697510 -0.88277150 -0.83963955
0.1330428 ?1.3450534
0.24720930 ?0.35162548 -1.34252630 ?1.6571129 -0.59412687 -0.13708733
2.0090270 ?0.7553207
-0.35385550 ?0.99058254 -1.14295716 -0.6801336 -0.76731365 -0.93148980
1.9120456 ?1.4084094
-0.92880313 ?1.14039444 ?1.38922106 -0.9008538 -0.79617811 -0.96178699
0.6512872 ?1.2365340
-0.24431565 -0.20947362 ?0.76084722 -0.8978493 -0.59412687 -0.56565825
-0.4639991 -0.2045137
-0.60428104 ?1.05108295 -0.68704030 ?1.1833813 ?0.41612935 -0.07054391
1.2816664 ?0.6181682
0.63837128 ?0.06672464 ?0.32041910 ?0.4154816 ?0.12748471 ?0.46057549
-0.2488216 ?0.3867322
0.67144677 ?0.66889622 ?1.83857364 ?0.8375587 ?0.27180703 ?0.82551787
-0.2488216 -0.5987399
2.53611774 ?1.45517653 -0.22337307 ?0.9253861 ?0.06975579 -0.22307224
1.6332240 ?0.5146235
-0.13273765 -0.55628531 ?0.55154280 -0.2721408 ?0.99341861 -0.14553291
-0.1669935 ?0.9976660
-0.02043306 -1.52670601 -2.08967318 ?1.7138916 ?2.14799715 ?2.18006143
-0.6034099 -0.9383742
0.80218610 -0.58481301 ?0.18945796 ?0.9761855 ?1.57070788 ?1.90295452
-0.6579619 -1.3578423
1.32726744 ?0.64941495 -0.42596631 ?0.7975236 ?0.87796076 ?0.63986198
-0.0760734 -1.0445683
-1.53219503 ?0.57349823 ?1.03668089 ?0.5040093 ?1.05114754 ?0.83815684
-0.3852017 -0.8672218
0.67016035 ?0.81036993 ?0.14519361 ?0.5065215 ?1.05114754 ?0.49360195
-0.1124414 -0.7921778
1.53517131 -0.85469204 -0.12003248 ?0.3702800 ?1.02228308 ?0.66797133
-0.3185269 -1.1538661
-0.67154028 -1.45978251 -0.88080583 -0.7266479 ?0.93568969 ?0.18901542
-0.8216180 ?1.0411473
Thanks!

Best wishes,

Mariano

--------------------------

Mariano Devoto
School of Biological Sciences
University of Bristol
Woodland Road

Bristol, UK
BS8 1UG
Tel. +44 (0) 1179545960 (internal 45960)
web: http://agro.uba.ar/~mdevoto <http://agro.uba.ar/%7Emdevoto>

Michael Denslow

I.W. Carpenter Jr. Herbarium [BOON]
Department of Biology
Appalachian State University
Boone, North Carolina U.S.A.
-- AND --
Communications Manager
Southeast Regional Network of Expertise and Collections
sernec.org

36.214177, -81.681480 +/- 3103 meters
Are your variables species abundances, or other types of descriptors? If
the former, standardization by column may not be ideal. Transformations
such as the Hellinger, as suggested by Michael, were developed for
species abundances data (Legendre & Gallagher 2001).

There are many ways to transform variables to normalize them, if that's
what you're after; see chapter 1 or Legendre & Legendre (1998). The
Box-Cox method is possibly the closest thing to what you're asking, i.e.
the "best possible transformation for each of the variables". But I'm
convinced there are as many opinions on the subject as there are
different methods.

Cheers

Etienne

Le lundi 19 avril 2010 ? 20:02 -0300, Devoto Mariano a ?crit :
Dear all,
I'm trying to do a redundancy analysis. I'm following Legendre & Legendre's
(1998) tips to prepare the data prior to the analysis, and Im hoping to do
the analysis using package 'vegan'.
I've already centered and standardized my explanatory and response
variables, but I'm having trouble at deciding whether or not (and how) data
should be transformed "to linearise the relationships and make the
distributions more symmetric". Is there a way to find the best possible
transformation for each variable but considering at the same time its
linearity to the other ones? Please tell me if I'm not even asking the right
question here...
Heres my dataset. First 3 columns are my response variables. All the others
are explanatory. I know this is a rather basic query, but any tips will be
greatly appreciated.

  -0.49350555 -0.37364383  0.70566360 -1.1180986 -1.14255167 -1.30234943
-1.0812858 -0.4910362
-1.02769104  0.21678178  1.11781073 -1.1123319 -0.88277150 -0.80445588
-1.0638291  0.3241891
-0.64335588 -2.07868376 -1.36782590 -1.0585453 -1.02709382 -1.07710897
-0.2760976  1.4695121
0.25799225  0.82044015  1.02481726 -1.1114373 -0.94050043 -1.23089531
-0.7064526 -0.5012921
0.56048832 -0.29655712 -0.07148828 -1.1099933 -1.17141614  1.54301771
-1.0921962 -1.9517655
-0.36443725 -1.49241963 -0.23840793 -1.1180554 -1.14255167 -1.24049362
-1.0856499 -0.6977804
-1.97959936  1.30035099 -1.18114614  1.0885061 -0.59412687 -0.21062037
1.7890870  0.5018224
-0.24966043 -0.66228200  0.69101500 -0.8697510 -0.88277150 -0.83963955
0.1330428  1.3450534
0.24720930  0.35162548 -1.34252630  1.6571129 -0.59412687 -0.13708733
2.0090270  0.7553207
-0.35385550  0.99058254 -1.14295716 -0.6801336 -0.76731365 -0.93148980
1.9120456  1.4084094
-0.92880313  1.14039444  1.38922106 -0.9008538 -0.79617811 -0.96178699
0.6512872  1.2365340
-0.24431565 -0.20947362  0.76084722 -0.8978493 -0.59412687 -0.56565825
-0.4639991 -0.2045137
-0.60428104  1.05108295 -0.68704030  1.1833813  0.41612935 -0.07054391
1.2816664  0.6181682
0.63837128  0.06672464  0.32041910  0.4154816  0.12748471  0.46057549
-0.2488216  0.3867322
0.67144677  0.66889622  1.83857364  0.8375587  0.27180703  0.82551787
-0.2488216 -0.5987399
2.53611774  1.45517653 -0.22337307  0.9253861  0.06975579 -0.22307224
1.6332240  0.5146235
-0.13273765 -0.55628531  0.55154280 -0.2721408  0.99341861 -0.14553291
-0.1669935  0.9976660
-0.02043306 -1.52670601 -2.08967318  1.7138916  2.14799715  2.18006143
-0.6034099 -0.9383742
0.80218610 -0.58481301  0.18945796  0.9761855  1.57070788  1.90295452
-0.6579619 -1.3578423
1.32726744  0.64941495 -0.42596631  0.7975236  0.87796076  0.63986198
-0.0760734 -1.0445683
-1.53219503  0.57349823  1.03668089  0.5040093  1.05114754  0.83815684
-0.3852017 -0.8672218
0.67016035  0.81036993  0.14519361  0.5065215  1.05114754  0.49360195
-0.1124414 -0.7921778
1.53517131 -0.85469204 -0.12003248  0.3702800  1.02228308  0.66797133
-0.3185269 -1.1538661
-0.67154028 -1.45978251 -0.88080583 -0.7266479  0.93568969  0.18901542
-0.8216180  1.0411473

Thanks!

Best wishes,

Mariano

--------------------------

Mariano Devoto
School of Biological Sciences
University of Bristol
Woodland Road

Bristol, UK
BS8 1UG
Tel. +44 (0) 1179545960 (internal 45960)
web: http://agro.uba.ar/~mdevoto <http://agro.uba.ar/%7Emdevoto>

	[[alternative HTML version deleted]]

_______________________________________________
R-sig-ecology mailing list
R-sig-ecology at r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology

Etienne Lalibert?
================================
School of Forestry
University of Canterbury
Private Bag 4800
Christchurch 8140, New Zealand
Phone: +64 3 366 7001 ext. 8365
Fax: +64 3 364 2124
www.elaliberte.info
Dear Devoto Mariano,

Dear all,
I'm trying to do a redundancy analysis. I'm following Legendre & Legendre's
(1998) tips to prepare the data prior to the analysis, and I?m hoping to do
the analysis using package 'vegan'.
I've already centered and standardized my explanatory and response
variables,
You do not need to do this in vegan. Vegan uses methods that cope nicely
with non-centred constraints in original scale. Pierre Legendre explains the
"projection matrix" method where centring is necessary and standardization
useful, but vegan uses different methods (QR decomposition).
but I'm having trouble at deciding whether or not (and how) data
should be transformed "to linearise the relationships and make the
distributions more symmetric". Is there a way to find the best possible
transformation for each variable but considering at the same time its
linearity to the other ones? Please tell me if I'm not even asking the right
question here...
This is a difficult question, and there is no easy answer. RDA is basically
a linear method and linear combination scores (LC scores) indeed are linear
combinations of constraints. Nonlinear transformation will change the LC
scores and hence the ordination. Selecting an optimal transformation for
multivariate explanatory variables (constraints) for multivariate response
(species) is a tricky thing, and people usually do not try to do this. I
have no idea how to do this. For instance, I have no idea what would be a
criterion of "good" model -- the only thing I'm sure is that goodness of fit
(eigenvalue) is not a good criterion. What you may be do is to inspect the
constraints by pairs() plots, and see if there are some strange distribution
patterns in pairwise panels. It is a completely different question than
having a good linear relationship between your joint constraints
simultaneously to all species simultaneously, though.

If you intended to ask about transformation of species data, read the other
answers.

Cheers, Jari Oksanen
Are your variables species abundances, or other types of descriptors? If
the former, standardization by column may not be ideal.
I think this needs a little clarification - or a different take on it.
Standardising the species (response) data in PCA/RDA results in each
species having unit variance and hence contributing an equal amount to
the "inertia" measure. This tends to give a more balanced ordination of
abundance data.

In unstandardised PCA/RDA, abundant species with high variance tend to
dominate the resulting ordination.

Standardisation is called for when response data are measured in
different units (i.e. when not species abundances), but may be desirable
for species abundances and in my experience is quite often warranted.

G
 Transformations
such as the Hellinger, as suggested by Michael, were developed for
species abundances data (Legendre & Gallagher 2001).

There are many ways to transform variables to normalize them, if that's
what you're after; see chapter 1 or Legendre & Legendre (1998). The
Box-Cox method is possibly the closest thing to what you're asking, i.e.
the "best possible transformation for each of the variables". But I'm
convinced there are as many opinions on the subject as there are
different methods.

Cheers

Etienne

Le lundi 19 avril 2010 ? 20:02 -0300, Devoto Mariano a ?crit :
Dear all,
I'm trying to do a redundancy analysis. I'm following Legendre & Legendre's
(1998) tips to prepare the data prior to the analysis, and Im hoping to do
the analysis using package 'vegan'.
I've already centered and standardized my explanatory and response
variables, but I'm having trouble at deciding whether or not (and how) data
should be transformed "to linearise the relationships and make the
distributions more symmetric". Is there a way to find the best possible
transformation for each variable but considering at the same time its
linearity to the other ones? Please tell me if I'm not even asking the right
question here...
Heres my dataset. First 3 columns are my response variables. All the others
are explanatory. I know this is a rather basic query, but any tips will be
greatly appreciated.

  -0.49350555 -0.37364383  0.70566360 -1.1180986 -1.14255167 -1.30234943
-1.0812858 -0.4910362
-1.02769104  0.21678178  1.11781073 -1.1123319 -0.88277150 -0.80445588
-1.0638291  0.3241891
-0.64335588 -2.07868376 -1.36782590 -1.0585453 -1.02709382 -1.07710897
-0.2760976  1.4695121
0.25799225  0.82044015  1.02481726 -1.1114373 -0.94050043 -1.23089531
-0.7064526 -0.5012921
0.56048832 -0.29655712 -0.07148828 -1.1099933 -1.17141614  1.54301771
-1.0921962 -1.9517655
-0.36443725 -1.49241963 -0.23840793 -1.1180554 -1.14255167 -1.24049362
-1.0856499 -0.6977804
-1.97959936  1.30035099 -1.18114614  1.0885061 -0.59412687 -0.21062037
1.7890870  0.5018224
-0.24966043 -0.66228200  0.69101500 -0.8697510 -0.88277150 -0.83963955
0.1330428  1.3450534
0.24720930  0.35162548 -1.34252630  1.6571129 -0.59412687 -0.13708733
2.0090270  0.7553207
-0.35385550  0.99058254 -1.14295716 -0.6801336 -0.76731365 -0.93148980
1.9120456  1.4084094
-0.92880313  1.14039444  1.38922106 -0.9008538 -0.79617811 -0.96178699
0.6512872  1.2365340
-0.24431565 -0.20947362  0.76084722 -0.8978493 -0.59412687 -0.56565825
-0.4639991 -0.2045137
-0.60428104  1.05108295 -0.68704030  1.1833813  0.41612935 -0.07054391
1.2816664  0.6181682
0.63837128  0.06672464  0.32041910  0.4154816  0.12748471  0.46057549
-0.2488216  0.3867322
0.67144677  0.66889622  1.83857364  0.8375587  0.27180703  0.82551787
-0.2488216 -0.5987399
2.53611774  1.45517653 -0.22337307  0.9253861  0.06975579 -0.22307224
1.6332240  0.5146235
-0.13273765 -0.55628531  0.55154280 -0.2721408  0.99341861 -0.14553291
-0.1669935  0.9976660
-0.02043306 -1.52670601 -2.08967318  1.7138916  2.14799715  2.18006143
-0.6034099 -0.9383742
0.80218610 -0.58481301  0.18945796  0.9761855  1.57070788  1.90295452
-0.6579619 -1.3578423
1.32726744  0.64941495 -0.42596631  0.7975236  0.87796076  0.63986198
-0.0760734 -1.0445683
-1.53219503  0.57349823  1.03668089  0.5040093  1.05114754  0.83815684
-0.3852017 -0.8672218
0.67016035  0.81036993  0.14519361  0.5065215  1.05114754  0.49360195
-0.1124414 -0.7921778
1.53517131 -0.85469204 -0.12003248  0.3702800  1.02228308  0.66797133
-0.3185269 -1.1538661
-0.67154028 -1.45978251 -0.88080583 -0.7266479  0.93568969  0.18901542
-0.8216180  1.0411473

Thanks!

Best wishes,

Mariano

--------------------------

Mariano Devoto
School of Biological Sciences
University of Bristol
Woodland Road

Bristol, UK
BS8 1UG
Tel. +44 (0) 1179545960 (internal 45960)
web: http://agro.uba.ar/~mdevoto <http://agro.uba.ar/%7Emdevoto>

	[[alternative HTML version deleted]]

_______________________________________________
R-sig-ecology mailing list
R-sig-ecology at r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology

%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
 Dr. Gavin Simpson             [t] +44 (0)20 7679 0522
 ECRC, UCL Geography,          [f] +44 (0)20 7679 0565
 Pearson Building,             [e] gavin.simpsonATNOSPAMucl.ac.uk
 Gower Street, London          [w] http://www.ucl.ac.uk/~ucfagls/
 UK. WC1E 6BT.                 [w] http://www.freshwaters.org.uk
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
An embedded and charset-unspecified text was scrubbed...
Name: not available
URL: <https://stat.ethz.ch/pipermail/r-sig-ecology/attachments/20100421/b91169ff/attachment.pl>

A related question by the way, is it possible in vegan to perform a forward
selection of variables in the context of a redundancy analysis just like
Canoco does?

Dear Devoto Mariano,

Not "just like Canoco does", but there is an improved way. Check ordistep()
for Canoco-style analysis, and add1.cca, drop1.cca for another ways.

Cheers, Jari Oksanen