Date: Mon, 19 Oct 2009 13:44:57 +0200
From: Jan Hanspach <jan.hanspach at ufz.de>
Subject: [R-sig-eco] Rotations for PCoA?
To: r-sig-ecology at r-project.org
Message-ID: <4ADC5139.9000707 at ufz.de>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Now, I want to know if ?it is possible to get rotations for my traits,
like it is calculated for a PCA? So that I can plot my traits within the
ordinations space (or give the values in a table).
Thanks!
Jan
Pierre Legendre recently emailed me one function which does exactly
that. I haven't found it on his website so I assume it's not
"official" yet. In any case, I'm sure he wouldn't mind me sharing it
and it can certainly give you some good ideas on how to represent the
original variables in a PCoA biplot. Here's the code. Enjoy.
pcoa.biplot <- function(Y, D, dir.axis.1=1, dir.axis.2=1)
#
# This principal coordinate analysis (PCoA) function produces a biplot
# of the points with projection of the original variables.
#
# Y = (n x p) data table
# D = distance matrix of order n
# dir.axis.1 = -1 to revert axis 1 for the projection of points and variables
# dir.axis.2 = -1 to revert axis 2 for the projection of points and variables
#
# Pierre Legendre, November 2008
#
{
n = nrow(Y)
p = ncol(Y)
nn = nrow(as.matrix(D))
if(nn != n) stop("Different numbers of objects in Y and D")
# Principal coordinate analysis (PCoA)
k = min(p,(n-1))
pcoa.res = cmdscale(D, k=k, eig=TRUE)
size = length(pcoa.res$eig)
# Centre the Y data by columns
Y.c = apply(as.matrix(Y),2,scale,center=TRUE,scale=FALSE)
# Y.stand = apply(as.matrix(Y),2,scale,center=TRUE,scale=TRUE)
# Classical method to find position of variables for biplot:
# construct U from covariance matrix between Y and standardized points
# (equivalent to PCA scaling 1, since PCoA preserve distances among objects)
points.stand = apply(pcoa.res$points,2,scale,center=TRUE,scale=TRUE)
S = cov(Y, points.stand)
U = S %*% diag((pcoa.res$eig/(n-1))^(-0.5))
rownames(U) = rownames(Y[1:size,])
colnames(U) = colnames(U, do.NULL = FALSE, prefix = "Axis.")
rownames(pcoa.res$points) = rownames(Y)
colnames(pcoa.res$points) = colnames(pcoa.res$points, do.NULL = FALSE,
prefix = "Axis.")
diag.dir = diag(c(dir.axis.1,dir.axis.2))
pcoa.res$points[,1:2] = pcoa.res$points[,1:2] %*% diag.dir
U[,1:2] = U[,1:2] %*% diag.dir
biplot(pcoa.res$points, U, xlab="PCoA axis 1", ylab="PCoA axis 2")
title(main = c("PCoA biplot","Variables projected as in PCA","with
scaling type 1"), family="serif", line=4)
out = list(eig=pcoa.res$eig, points=pcoa.res$points, U=U, S=S)
out
}
# --------
# Test, comparison PCA - PCoA
# Source the functions PCA.R (Lab's Web page) and pcoa.bip.R (this one)
# par(mfrow=c(1,2))
# mat20 = matrix(rnorm(100),20,5)
# mat20.D = dist(mat20)
# mat20.PCA = PCA(mat20)
# biplot(mat20.PCA)
# mat20.PCoA.2 = pcoa.biplot(mat20, mat20.D)
# --------
# Does this method work when p > n ?
# mat20.t.D = dist(t(mat20))
# mat20.t.PCA = PCA(t(mat20))
# biplot(mat20.t.PCA)
# mat20.t.pcoa.2 = pcoa.biplot(t(mat20), mat20.t.D) # Works fine
# --------
--
Etienne Lalibert?
================================
Rural Ecology Research Group
School of Forestry
University of Canterbury
Private Bag 4800
Christchurch 8140, New Zealand
Phone: +64 3 366 7001 ext. 8365
Fax: +64 3 364 2124
www.elaliberte.info
On 20/10/09 13:19 PM, "Etienne Lalibert?" <etiennelaliberte at gmail.com>
wrote:
Date: Mon, 19 Oct 2009 13:44:57 +0200
From: Jan Hanspach <jan.hanspach at ufz.de>
Subject: [R-sig-eco] Rotations for PCoA?
To: r-sig-ecology at r-project.org
Message-ID: <4ADC5139.9000707 at ufz.de>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Now, I want to know if ?it is possible to get rotations for my traits,
like it is calculated for a PCA? So that I can plot my traits within the
ordinations space (or give the values in a table).
Thanks!
Jan
Pierre Legendre recently emailed me one function which does exactly
that. I haven't found it on his website so I assume it's not
"official" yet. In any case, I'm sure he wouldn't mind me sharing it
and it can certainly give you some good ideas on how to represent the
original variables in a PCoA biplot. Here's the code. Enjoy.
Howdy Folks,
Actually, it is "trivial" to have rotation scores for continuous variables:
it is nothing but matrix multiplication. The problem here is that the
original analysis used Gower distance for mixed metrics, and we should map
the factor variables onto continuous variables in the same way as the Gower
distance does. After that it is "easy" to find the "rotation" scores (but
see below on metrics). While this can be done, this probably is not wanted
since the interpretation of "rotation" for factors is non-intuitive, to put
it mildly. Gower distance handles factors in a special way, and they are not
the simple factor contrasts you get in standard R functions. How they are
actually handled can be seen in the Fortran code for daisy or in Gower's
paper. An extra complication is that Gower distance uses Manhattan metric.
Therefore it is not possible to just transform data matrix into a form that
would give the rotation scores in Euclidean PCoA.
The standard way (that is not used in Gower distance) to transform factors
into continuous data is to use
mm <- model.matrix(~ ., mydatawithfactors)
Which gives you a model matrix where factors are broken into continuous
contrast variables. You can get "rotation", biplot scores or what ever you
call for these contrasts, but that is only the beginning of the problems --
what are you going to do with those scores?
Perhaps you can try vegan function envfit which finds the "rotation" scores
for continuous variables and class centroids for factor variables. They are
not the same as the strict rotation scores for factors in Gower metric (but
should be the same as the Legendre scores for continuous variables), but may
be more intuitive.
Cheers, jari oksanen
Dear Jari and all others, thanks for your responses!
Howdy Folks,
Actually, it is "trivial" to have rotation scores for continuous variables:
it is nothing but matrix multiplication. The problem here is that the
original analysis used Gower distance for mixed metrics, and we should map
the factor variables onto continuous variables in the same way as the Gower
distance does. After that it is "easy" to find the "rotation" scores (but
see below on metrics). While this can be done, this probably is not wanted
since the interpretation of "rotation" for factors is non-intuitive, to put
it mildly. Gower distance handles factors in a special way, and they are not
the simple factor contrasts you get in standard R functions. How they are
actually handled can be seen in the Fortran code for daisy or in Gower's
paper. An extra complication is that Gower distance uses Manhattan metric.
Therefore it is not possible to just transform data matrix into a form that
would give the rotation scores in Euclidean PCoA.
The standard way (that is not used in Gower distance) to transform factors
into continuous data is to use
mm <- model.matrix(~ ., mydatawithfactors)
That would have be my alternative when the envfit method (see below) was
not appropriate.
Which gives you a model matrix where factors are broken into continuous
contrast variables. You can get "rotation", biplot scores or what ever you
call for these contrasts, but that is only the beginning of the problems --
what are you going to do with those scores?
I want to present the scores in table (or possibly a biplot) to
illustrate interrelatedness of variables. I want to use the variables
(not the scores) later in a multi-variable regression and want to give
the reader an idea of the data structure. I don't need the scores for
further analysis/calculations. So they do not have to be precise in a
strict sense, but at least the method should be justified and not
completely wrong.
Perhaps you can try vegan function envfit which finds the "rotation" scores
for continuous variables and class centroids for factor variables. They are
not the same as the strict rotation scores for factors in Gower metric (but
should be the same as the Legendre scores for continuous variables), but may
be more intuitive.
O.k., I'll keep it in mind, that envfit scores do not equal "Gower-PCoA-scores", but they are some kind of easily obtainable substitute and I hope they represent the true scores sufficiently well for my illustrative purposes.
Thanks again for your advice
Best
Jan