Dear all,
I am working with rather large community dataset (~30 spp and ~20,000
samples) which I am trying to relate to several environmental variables.
One of the environmental variables that I am specifically interested in is
a factor.
I have conducted a CCA (vegan::cca) relating env variables to the
community, and have found significant effects. I am now interested in
understanding how to characterize my community data with respect to the
categorical env factor mentioned earlier - i.e. is a given spp more
associated with a given level of the env variable.
In the past, I have used vegan::simper for such explorations, but am having
difficulty applying it here due to the size of the data (i.e. a distance
matrix for the biological community would be 20,000 X 20,000). I'm
wondering if there are any approaches that I might apply that are less
computationally intensive.
One approach that I explored was to look for the closest env level for each
species based on CCA coordinates (from significant axes) using a nearest
neighbor search (FNN::get.knnx). The results look promising (see example
below), but I haven't come across the approach anywhere else. It is also
unfortunately not a statistical test, so I lose some quantitative measure
of how likely these associations are
Does anyone have any other suggestions how I might do such an analysis with
this large dataset?
Many thanks in advance,
Marc
#### EXAMPLE ####
set. seed(1)
### required packages and data
library(FNN)
library(vegan)
data(dune)
data(dune.env)
str(dune.env)
### fit model
mod <- cca(dune ~ A1 + Moisture + Management, data=dune.env)
# visualize
plot(mod, display = c("sp", "bp", "cn"))
### Permutation Test for CCA
# terms sig. test (tested sequentially - i.e. order matters)
(At <- anova(mod, by = "terms", permutations = 499))
# cca axes sig. test
(Ax <- anova(mod, by = "axis", permutations = 499, cutoff = 0.1))
### Determine nearest neighbor of Moisture level for each species
# number of significant CCA axes
n <- 2
# retrieve un-scaled CCA coordinates
res <- summary(mod, scaling = 0, axes = n, display = c("sp", "wa", "lc",
"bp", "cn"))
# get CCA indices for Moisture levels
mat <- match(paste0("Moisture",levels(dune.env$Moisture)),
rownames(res$centroids))
# nearest neighbor
pred <- get.knnx(data = res$centroids[mat,], query = res$species,
k = length(levels(dune.env$Moisture)))$nn.index[,1]
# return results
tmp <- data.frame(spp = rownames(res$species),
nearest_level = paste0("Moisture",levels(dune.env$Moisture))[pred])
tmp
plot(mod, display = c("cn"))
text(mod, display = "sp",
col =
rainbow(length(levels(dune.env$Moisture)))[as.numeric(tmp$nearest_level)])
community analysis - associating species with categorical environmental parameters
3 messages · Marc Taylor, Juan Antonio Balbuena
You may have a look at the RRPP package (Collyer & Adams 2018, Methods Ecol Evol doi: 10.1111/2041-210X.13029) Best Juan El 18/07/2019 a las 14:48, Marc Taylor escribi?:
Dear all,
I am working with rather large community dataset (~30 spp and ~20,000
samples) which I am trying to relate to several environmental variables.
One of the environmental variables that I am specifically interested in is
a factor.
I have conducted a CCA (vegan::cca) relating env variables to the
community, and have found significant effects. I am now interested in
understanding how to characterize my community data with respect to the
categorical env factor mentioned earlier - i.e. is a given spp more
associated with a given level of the env variable.
In the past, I have used vegan::simper for such explorations, but am having
difficulty applying it here due to the size of the data (i.e. a distance
matrix for the biological community would be 20,000 X 20,000). I'm
wondering if there are any approaches that I might apply that are less
computationally intensive.
One approach that I explored was to look for the closest env level for each
species based on CCA coordinates (from significant axes) using a nearest
neighbor search (FNN::get.knnx). The results look promising (see example
below), but I haven't come across the approach anywhere else. It is also
unfortunately not a statistical test, so I lose some quantitative measure
of how likely these associations are
Does anyone have any other suggestions how I might do such an analysis with
this large dataset?
Many thanks in advance,
Marc
#### EXAMPLE ####
set. seed(1)
### required packages and data
library(FNN)
library(vegan)
data(dune)
data(dune.env)
str(dune.env)
### fit model
mod <- cca(dune ~ A1 + Moisture + Management, data=dune.env)
# visualize
plot(mod, display = c("sp", "bp", "cn"))
### Permutation Test for CCA
# terms sig. test (tested sequentially - i.e. order matters)
(At <- anova(mod, by = "terms", permutations = 499))
# cca axes sig. test
(Ax <- anova(mod, by = "axis", permutations = 499, cutoff = 0.1))
### Determine nearest neighbor of Moisture level for each species
# number of significant CCA axes
n <- 2
# retrieve un-scaled CCA coordinates
res <- summary(mod, scaling = 0, axes = n, display = c("sp", "wa", "lc",
"bp", "cn"))
# get CCA indices for Moisture levels
mat <- match(paste0("Moisture",levels(dune.env$Moisture)),
rownames(res$centroids))
# nearest neighbor
pred <- get.knnx(data = res$centroids[mat,], query = res$species,
k = length(levels(dune.env$Moisture)))$nn.index[,1]
# return results
tmp <- data.frame(spp = rownames(res$species),
nearest_level = paste0("Moisture",levels(dune.env$Moisture))[pred])
tmp
plot(mod, display = c("cn"))
text(mod, display = "sp",
col =
rainbow(length(levels(dune.env$Moisture)))[as.numeric(tmp$nearest_level)])
[[alternative HTML version deleted]]
_______________________________________________ R-sig-ecology mailing list R-sig-ecology at r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
Dr. Juan A. Balbuena Cavanilles Institute of Biodiversity and Evolutionary Biology Symbiont Ecology and Evolution Lab University of Valencia http://www.uv.es/~balbuena <http://www.uv.es/%7Ebalbuena> P.O. Box 22085 http://www.uv.es/cophylpaco 46071 Valencia, Spain e-mail: j.a.balbuena at uv.es <mailto:j.a.balbuena at uv.es>tel. +34 963 543 658??? fax +34 963 543 733 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ *NOTE!*For shipments by EXPRESS COURIER use the following street address: C/ Catedr?tico Jos? Beltr?n 2, 46980 Paterna (Valencia), Spain. ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Thank you Juan. This does indeed look promising. Cheers, Marc On Thu, Jul 18, 2019 at 4:08 PM Juan Antonio Balbuena <j.a.balbuena at uv.es> wrote:
You may have a look at the RRPP package (Collyer & Adams 2018, Methods Ecol Evol doi: 10.1111/2041-210X.13029) Best Juan El 18/07/2019 a las 14:48, Marc Taylor escribi?:
Dear all, I am working with rather large community dataset (~30 spp and ~20,000 samples) which I am trying to relate to several environmental variables. One of the environmental variables that I am specifically interested in
is
a factor. I have conducted a CCA (vegan::cca) relating env variables to the community, and have found significant effects. I am now interested in understanding how to characterize my community data with respect to the categorical env factor mentioned earlier - i.e. is a given spp more associated with a given level of the env variable. In the past, I have used vegan::simper for such explorations, but am
having
difficulty applying it here due to the size of the data (i.e. a distance matrix for the biological community would be 20,000 X 20,000). I'm wondering if there are any approaches that I might apply that are less computationally intensive. One approach that I explored was to look for the closest env level for
each
species based on CCA coordinates (from significant axes) using a nearest neighbor search (FNN::get.knnx). The results look promising (see example below), but I haven't come across the approach anywhere else. It is also unfortunately not a statistical test, so I lose some quantitative measure of how likely these associations are Does anyone have any other suggestions how I might do such an analysis
with
this large dataset?
Many thanks in advance,
Marc
#### EXAMPLE ####
set. seed(1)
### required packages and data
library(FNN)
library(vegan)
data(dune)
data(dune.env)
str(dune.env)
### fit model
mod <- cca(dune ~ A1 + Moisture + Management, data=dune.env)
# visualize
plot(mod, display = c("sp", "bp", "cn"))
### Permutation Test for CCA
# terms sig. test (tested sequentially - i.e. order matters)
(At <- anova(mod, by = "terms", permutations = 499))
# cca axes sig. test
(Ax <- anova(mod, by = "axis", permutations = 499, cutoff = 0.1))
### Determine nearest neighbor of Moisture level for each species
# number of significant CCA axes
n <- 2
# retrieve un-scaled CCA coordinates
res <- summary(mod, scaling = 0, axes = n, display = c("sp", "wa", "lc",
"bp", "cn"))
# get CCA indices for Moisture levels
mat <- match(paste0("Moisture",levels(dune.env$Moisture)),
rownames(res$centroids))
# nearest neighbor
pred <- get.knnx(data = res$centroids[mat,], query = res$species,
k = length(levels(dune.env$Moisture)))$nn.index[,1]
# return results
tmp <- data.frame(spp = rownames(res$species),
nearest_level = paste0("Moisture",levels(dune.env$Moisture))[pred])
tmp
plot(mod, display = c("cn"))
text(mod, display = "sp",
col =
rainbow(length(levels(dune.env$Moisture)))[as.numeric(tmp$nearest_level)])
[[alternative HTML version deleted]]
_______________________________________________ R-sig-ecology mailing list R-sig-ecology at r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
-- Dr. Juan A. Balbuena Cavanilles Institute of Biodiversity and Evolutionary Biology Symbiont Ecology and Evolution Lab University of Valencia http://www.uv.es/~balbuena <http://www.uv.es/%7Ebalbuena> P.O. Box 22085 http://www.uv.es/cophylpaco 46071 Valencia, Spain e-mail: j.a.balbuena at uv.es <mailto:j.a.balbuena at uv.es>tel. +34 963 543 658 fax +34 963 543 733 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ *NOTE!*For shipments by EXPRESS COURIER use the following street address: C/ Catedr?tico Jos? Beltr?n 2, 46980 Paterna (Valencia), Spain. ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
_______________________________________________ R-sig-ecology mailing list R-sig-ecology at r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-ecology