Back to formatted view
Raw Message

Message-ID: <CACg2Sf090w0idMW+vNR5Pb-ZV4VnhDqv8VnmpuyreUrFviOYVQ@mail.gmail.com>
Date: 2019-07-18T12:48:57Z
From: Marc Taylor
Subject: community analysis - associating species with categorical environmental parameters

Dear all,

I am working  with rather large community dataset (~30 spp and ~20,000
samples) which I am trying to relate to several environmental variables.
One of the environmental variables that I am specifically interested in is
a factor.

I have conducted a CCA (vegan::cca) relating env variables to the
community, and have found significant effects. I am now interested in
understanding how to characterize my community data with respect to the
categorical env factor mentioned earlier - i.e. is a given spp more
associated with a given level of the env variable.

In the past, I have used vegan::simper for such explorations, but am having
difficulty applying it here due to the size of the data (i.e. a distance
matrix for the biological community would be 20,000 X 20,000). I'm
wondering if there are any approaches that I might apply that are less
computationally intensive.

One approach that I explored was to look for the closest env level for each
species based on CCA coordinates (from significant axes) using a nearest
neighbor search (FNN::get.knnx). The results look promising (see example
below), but I haven't come across the approach anywhere else. It is also
unfortunately not a statistical test, so I lose some quantitative measure
of how likely these associations are

Does anyone have any other suggestions how I might do such an analysis with
this large dataset?

Many thanks in advance,
Marc


#### EXAMPLE ####

set. seed(1)

### required packages and data
library(FNN)
library(vegan)

data(dune)
data(dune.env)
str(dune.env)

### fit model
mod <- cca(dune ~ A1 + Moisture + Management, data=dune.env)
# visualize
plot(mod, display = c("sp", "bp", "cn"))

### Permutation Test for CCA
# terms sig. test (tested sequentially - i.e. order matters)
(At <- anova(mod, by = "terms", permutations = 499))
# cca axes sig. test
(Ax <- anova(mod, by = "axis", permutations = 499, cutoff = 0.1))

### Determine nearest neighbor of Moisture level for each species
# number of significant CCA axes
n <- 2
# retrieve un-scaled CCA coordinates
res <- summary(mod, scaling = 0, axes = n, display = c("sp", "wa", "lc",
"bp", "cn"))
# get CCA indices for Moisture levels
mat <- match(paste0("Moisture",levels(dune.env$Moisture)),
rownames(res$centroids))

# nearest neighbor
pred <- get.knnx(data = res$centroids[mat,], query = res$species,
  k = length(levels(dune.env$Moisture)))$nn.index[,1]
# return results
tmp <- data.frame(spp = rownames(res$species),
  nearest_level = paste0("Moisture",levels(dune.env$Moisture))[pred])
tmp


plot(mod, display = c("cn"))
text(mod, display = "sp",
  col =
rainbow(length(levels(dune.env$Moisture)))[as.numeric(tmp$nearest_level)])

	[[alternative HTML version deleted]]