Rotations for PCoA? - R-SIG-ecology

Etienne Laliberté · 2009-10-20T10:19:15Z

> Date: Mon, 19 Oct 2009 13:44:57 +0200 > From: Jan Hanspach > Subject: [R-sig-eco] Rotations for PCoA? > To: r-sig-ecology at r-project.org > Message-ID: > Content-Type: text/plain; charset=ISO-8859-1; format=flowed > > > > Now, I want to know if ?it is possible to get rotations for my traits, > like it is calculated for a PCA? So that I can plot my traits within the > ordinations space (or give the values in a table). > Thanks! > Jan Pierr

Etienne Laliberté

Tue, Oct 20, 2009 3:19 AM #

Pierre Legendre recently emailed me one function which does exactly
that. I haven't found it on his website so I assume it's not
"official" yet. In any case, I'm sure he wouldn't mind me sharing it
and it can certainly give you some good ideas on how to represent the
original variables in a PCoA biplot. Here's the code. Enjoy.


pcoa.biplot <- function(Y, D, dir.axis.1=1, dir.axis.2=1)

#

# This principal coordinate analysis (PCoA) function produces a biplot

# of the points with projection of the original variables.

#

# Y = (n x p) data table

# D = distance matrix of order n

# dir.axis.1 = -1 to revert axis 1 for the projection of points and variables

# dir.axis.2 = -1 to revert axis 2 for the projection of points and variables

#

#            Pierre Legendre, November 2008

#

{

n = nrow(Y)

p = ncol(Y)

nn = nrow(as.matrix(D))

if(nn != n) stop("Different numbers of objects in Y and D")



# Principal coordinate analysis (PCoA)

k = min(p,(n-1))

pcoa.res = cmdscale(D, k=k, eig=TRUE)

size = length(pcoa.res$eig)



# Centre the Y data by columns

Y.c = apply(as.matrix(Y),2,scale,center=TRUE,scale=FALSE)

# Y.stand = apply(as.matrix(Y),2,scale,center=TRUE,scale=TRUE)



# Classical method to find position of variables for biplot:

# construct U from covariance matrix between Y and standardized points

# (equivalent to PCA scaling 1, since PCoA preserve distances among objects)

points.stand = apply(pcoa.res$points,2,scale,center=TRUE,scale=TRUE)

S = cov(Y, points.stand)

U = S %*% diag((pcoa.res$eig/(n-1))^(-0.5))



rownames(U) = rownames(Y[1:size,])

colnames(U) = colnames(U, do.NULL = FALSE, prefix = "Axis.")

rownames(pcoa.res$points) = rownames(Y)

colnames(pcoa.res$points) = colnames(pcoa.res$points, do.NULL = FALSE,
prefix = "Axis.")



diag.dir = diag(c(dir.axis.1,dir.axis.2))

pcoa.res$points[,1:2] = pcoa.res$points[,1:2] %*% diag.dir

U[,1:2] = U[,1:2] %*% diag.dir

biplot(pcoa.res$points, U, xlab="PCoA axis 1", ylab="PCoA axis 2")

title(main = c("PCoA biplot","Variables projected as in PCA","with
scaling type 1"), family="serif", line=4)



out = list(eig=pcoa.res$eig, points=pcoa.res$points, U=U, S=S)

out

}



# --------



# Test, comparison PCA - PCoA

# Source the functions PCA.R (Lab's Web page) and pcoa.bip.R (this one)



# par(mfrow=c(1,2))

# mat20 = matrix(rnorm(100),20,5)

# mat20.D = dist(mat20)



# mat20.PCA = PCA(mat20)

# biplot(mat20.PCA)



# mat20.PCoA.2 = pcoa.biplot(mat20, mat20.D)



# --------



# Does this method work when p > n ?



# mat20.t.D = dist(t(mat20))



# mat20.t.PCA = PCA(t(mat20))

# biplot(mat20.t.PCA)



# mat20.t.pcoa.2 = pcoa.biplot(t(mat20), mat20.t.D)  # Works fine



# --------

--
Etienne Lalibert?
================================
Rural Ecology Research Group
School of Forestry
University of Canterbury
Private Bag 4800
Christchurch 8140, New Zealand
Phone: +64 3 366 7001 ext. 8365
Fax: +64 3 364 2124
www.elaliberte.info

Jari Oksanen

Tue, Oct 20, 2009 1:40 PM #

On 20/10/09 13:19 PM, "Etienne Lalibert?" <etiennelaliberte at gmail.com>
wrote:

Howdy Folks,

Actually, it is "trivial" to have rotation scores for continuous variables:
it is nothing but matrix multiplication. The problem here is that the
original analysis used Gower distance for mixed metrics, and we should map
the factor variables onto continuous variables in the same way as the Gower
distance does. After that it is "easy" to find the "rotation" scores (but
see below on metrics). While this can be done, this probably is not wanted
since the interpretation of "rotation" for factors is non-intuitive, to put
it mildly. Gower distance handles factors in a special way, and they are not
the simple factor contrasts you get in standard R functions. How they are
actually handled can be seen in the Fortran code for daisy or in Gower's
paper. An extra complication is that Gower distance uses Manhattan metric.
Therefore it is not possible to just transform data matrix into a form that
would give the rotation scores in Euclidean PCoA.

The standard way (that is not used in Gower distance) to transform factors
into continuous data is to use

 mm <- model.matrix(~ ., mydatawithfactors)

Which gives you a model matrix where factors are broken into continuous
contrast variables. You can get "rotation", biplot scores or what ever you
call for these contrasts, but that is only the beginning of the problems --
what are you going to do with those scores?

Perhaps you can try vegan function envfit which finds the "rotation" scores
for continuous variables and class centroids for factor variables. They are
not the same as the strict rotation scores for factors in Gower metric (but
should be the same as the Legendre scores for continuous variables), but may
be more intuitive. 

Cheers, jari oksanen

Jan Hanspach

Wed, Oct 21, 2009 4:21 AM #

Dear Jari and all others, thanks for your responses!

That would have be my alternative when the envfit method (see below) was 
not appropriate.

I want to present the scores in table (or possibly a biplot) to 
illustrate interrelatedness of variables. I want to use the variables 
(not the scores) later in a multi-variable regression and want to give 
the reader an idea of the data structure. I don't need the scores for 
further analysis/calculations. So they do not have to be precise in a 
strict sense, but at least the method should be justified and not 
completely wrong.

O.k., I'll keep it in mind, that envfit scores do not equal "Gower-PCoA-scores", but they are some kind of easily obtainable substitute and I hope they represent the true scores sufficiently well for my illustrative purposes.

Thanks again for your advice
Best
Jan