null models with continuous abundance data
On 07/01/2010 21:48, "Etienne Lalibert?" <etiennelaliberte at gmail.com> wrote:
Many thanks again Carsten.
Yes, you're right that care must be taken to
ensure that a decent number
of unique random matrices must be obtained. I
don't think it would be a
problem in my case given that transforming my
continuous abundance data
to count by mat2<- floor(mat * 100 / min(mat[mat >
0]) )
can quickly lead to a very large number of individuals (hundreds
of
thousands if not > million in some cases). The issue then becomes
mostly
one of computing speed, but that's another story. Following your
suggestion, I came up with a simple (read na?ve) R
algorithm that starts with
a binary matrix obtained by commsimulator
then fills it randomly. It seems to
work relatively well, though it's
extremely slow. In addition, because the
algorithm fills the matrix one individual at a
time, I sometimes ran into the
problem of one individual which had
nowhere to go because "ressources" at the
site (i.e. the row sum) was
already full. In that case, this individual
disappears, i.e. goes
nowhere. This is a bit drastic and clearly sub-optimal,
but it's the
only way I could quickly think of yesterday to prevent the
algorithm
from getting stuck. The consequence is that often, the total number
of
individuals in the null matrix is a bit less than total number
of
individuals in the observed matrix. Etienne, This is the same problem as filling up a binary matrix: you get stuck before you can fill in all presences. The solution in vegan::commsimulator was "backtracking" method: when you get stuck, you remove some of the items (backtrack to an earlier stage of filling) and start again. This is even slower than initial filling, but it may be doable. The model application of backtracking goes back to Gotelli & Entsminger (Oecologia 129, 282--291; 2001) who had this for binary data, but the same idea is natural for quantitative null models, too. The vegan vignette discusses some details of implementation which are valid also for quantitative filling. I would like to repeat the serious warning that Carsten Dormann made: you better be sure that your generated null models really are random. P?ter S?lymos developed some quantitative null models for vegan, but we found in our tests that they were not random at all, but the constraints we put produced peculiar kind of data with regular features although there was technically no problem. Therefore we removed code worth of huge amount of developing work as dangerous for use. I do think that also the r2dtable approach is more regular and less variable (lower variance) than the community data, and you very easily get misleadingly significant results when you compare your observed community against too regular null models. This is something analogous to using strict Poisson error in glm or gam when the data in fact are overdispersed to Poisson. Be very careful if you want to claim significant results based on quantitative null models. Would I be a referee or an editor for this kind of ms, I would be very skeptical ask for the proof of the validity of the null model. Cheers, Jari Oksanen
I don't see this as being a real
problem with the large matrices I'll deal with though, but
definitely
something to be aware of. If you're curious, here's the simple
algorithm, as well as some code to
run a test below. I'd be very interested to
compare it to Vazquez's
algorithm! Thanks again Etienne ### nullabun <-
function(x){
require(vegan) nsites <- nrow(x) nsp <- ncol(x)
rowsum <- rowSums(x)
colsum <- colSums(x) nullbin <- commsimulator(x,
method = "quasiswap")
rownull <- rowSums(nullbin) colnull <-
colSums(nullbin)
rowsum <- rowsum - rownull colsum <- colsum - colnull
ind <- rep(1:nsp, colsum)
ress <- rep(1:nsites, rowsum) total <-
sum(rowsum)
selinds <- sample(1:total)
for (j in 1:total){
indsel <- ind[selinds[j]]
sitepot <- which(nullbin[, indsel] > 0)
if (length(ress[ress %in% sitepot]) > 0 ){
sitesel <-
sample(ress[ress %in% sitepot], 1)
ress <- ress[-which(ress ==
sitesel)[1] ]
nullbin[sitesel, indsel] <- nullbin[sitesel, indsel]
+ 1
}
}
return(nullbin)
}
### now let's test the
function
# create a dummy abundance matrix m <-
matrix(c(1,3,2,0,3,1,0,2,1,0,2,1,0,0,1,2,0,3,0,0,0,1,4,3), 4, 6,
byrow=TRUE) # generate a null matrix m.null <- nullabun(m) # compare
total number of individuals
sum(m) #30 sum(m.null) #29: one individual got
"stuck" with no ressources...
# to generate more than one null matrix, say
999
nulls <- replicate(n = 999, nullabun(m), simplify = F) # how many unique
null matrices?
length(unique(nulls) ) # I found 983 out of 999 # how many
individuals in m?
sum(m) # there are 30 # how bad is the problem of
individuals getting stuck with no ressources
in null matrices? sums <-
as.numeric(lapply(nulls, sum, simplify = T))
hist(sums) # the vast majority
had 29 or 30 individuals
Le jeudi 07 janvier 2010 ? 10:34 +0100, Carsten
Dormann a ?crit : Hi Etienne, I'm afraid that swap.web cannot easily accommodate this constraint. Diego Vazquez has used an alternative approach for this problem, but I haven't seen code for it (it's briefly described in his Oikos 2005 paper). While swap.web starts with a "too-full" matrix and then downsamples, Diego starts by allocating bottom-up. There it should be relatively easy to also constrain the row-constancy of connectance. I can't promise, but I will hopefully have this algorithm by the end of next week (I'm teaching this stuff next week and this is one of the assignments). If so, I'll get it over to you. The problem that already arises with swap.web for very sparse matrices is that there are very few unique combinations left after all these constraints. This will be even worse for your approach, and plant community data are usually VERY sparse. Once you have produced the null models, you should assure yourself that there are hundreds (better thousands) of possible randomised matrices. Adding just one more constraint to your null model (that of column AND row constancy of 0s) will uniquely define the matrix! Referees may not pick it up, but it may give you trivial results. Best wishes, Carsten V?zquez, D. P., Meli?n, C. J., Williams, N. M., Bl?thgen, N., Krasnov, B. R., Poulin, R., et al. (2007). Species abundance and asymmetric interaction strength in ecological networks. Oikos, 116, 1120-1127. On 06/01/2010 23:57, Etienne Lalibert? wrote:
Many thanks Carsten and Peter for your suggestions.
commsimulator indeed respects the two contraints I'm interested in, but> only allows for binary data. swap.web is *almost* what I need, but
only overall matrix fill is kept
constant, whereas I want zeros to move
only between rows, not between
both columns and rows. In others words, if
the initial data matrix had
three zeros for row 1, permuted matrices
should also have three zeros
for that row. I do not doubt that
Peter's suggestions are good, but I'm afraid they
seem a bit overly
complicated for my particular problem. All I'm after
is to create n
randomly-assembled matrices from an observed species
abundance matrix to
compare the observed functional diversity of the
sampling sites to a null
expectation. To be conservative, this requires
that I hold species
richness constant at each site, and keep row and
column marginals fixed.
Could swap.web or permatswap(..., method = "quasiswap") be easily
tweaked to accomodate this? The only difference really is that matrix
fill
should be kept constant *but also* be constrained within rows.
Thanks
again for your help.
Etienne Le 7 janvier 2010 08:17, Peter
Solymos <solymos at ualberta.ca> a ?crit :
Dear Etienne,
You can try the Chris Hennig's prablus package which have a parametric
bootstrap based null-model where clumpedness of occurrences or
abundances (this might allow continuous data, too) is estimated from
the
site-species matrix and used in the null-model generation. But
here, the
sum of the matrix will vary randomly.
But if you have
environmental covariates, you might try something more
parametric. For
example the simulate.rda or simulate.cca functions in
the vegan package,
or fit multivariate LM for nested models (i.e.
intercept only, and with
other covariates) and compare AIC's, or use
the simulate.lm to get
random numbers based on the fitted model. This
way you can base you
desired statistic on the simulated data sets, and
you know explicitly
what is the model (plus it is good for continuous
data that you have).
By using the null-model approach, you implicitly
have a model by
defining constraints for the permutations, and
p-values are
probabilities of the data given the constraints (null
hypothesis), and
not probability of the null hypothesis given the data
(what people
usually really want).
Cheers, Peter
P?ter S?lymos
Alberta Biodiversity Monitoring Institute Department
of Biological Sciences
CW 405, Biological Sciences Bldg University
of Alberta
Edmonton, Alberta, T6G 2E9, Canada Phone:
780.492.8534
Fax: 780.492.7635 On Wed, Jan 6,
2010 at 2:18 AM, Carsten Dormann <carsten.dormann at ufz.de> wrote:
Hi Etienne, the double constraint is observed by two
functions:
swap.web in package bipartite
and
commsimulator in vegan (at least in the r-forge
version)
Both build on the r2dtable approach, i.e. you have,
as you propose, to turn
the low values into higher-value integers.
The algorithm is described in the help to swap.web.
HTH, Carsten
On 06/01/2010 08:55, Etienne Lalibert? wrote:
Hi,
Let's say I have measured plant biomass for a total of 5
species from 3
sites (i.e. plots), such that I end with the
following data matrix
mat<- matrix(c(0.35, 0.12, 0.61, 0,
0, 0.28, 0, 0.42, 0.31, 0.19, 0.82,
0, 0, 0, 0.25), 3, 5, byrow =
T)
dimnames(mat)<- list(c("site1", "site2", "site3"),
c("sp1", "sp2",
"sp3", "sp4", "sp5")) Data is
therefore continuous. I want to generate n random community
matrices
which both respect the following constraints:
1) row and
column totals are kept constant, such that "productivity" of
each
site is maintained, and that rare species at a "regional" level
stay
rare (and vice-versa).
2) number of species in each plot
is kept constant, i.e. each row
maintains the same number of zeros,
though these zeros should not stay
fixed. To
deal with continuous data, my initial idea was to transform the
continuous data in mat to integer data by
mat2<-
floor(mat * 100 / min(mat[mat> 0]) )
where multiplying
by 100 is only used to reduce the effect of rounding
to nearest
integer (a bit arbitrary). In a way, shuffling mat could now
be seen
as re-allocating "units of biomass" randomly to plots. However,
doing so results in a matrix with large number of "individuals" to
reshuffle, which can slow things down quite a bit. But this is only part
of the problem.
My main problem has been to find an
algorithm that can actually respect
constraints 1 and 2. Despite
trying various R functions (r2dtable,
permatfull, etc), I have not
yet been able to find one that can do
this.
I've had some kind help from Peter Solymos who suggested that I try the
aylmer package, and it's *almost* what I need, but the problem is that
their algorithm does not allow for the zeros to move within the
matrix;
they stay fixed. I want the number of zeros to stay constant
within each
row, but I want them to move freely betweem columns.
Any help would be very much appreciated.
Thanks
-- Dr. Carsten
F. Dormann
Department of Computational Landscape Ecology
Helmholtz Centre for Environmental Research-UFZ
Permoserstr. 15
04318 Leipzig
Germany Tel: ++49(0)341 2351946
Fax: ++49(0)341 2351939
Email: carsten.dormann at ufz.de
internet: http://www.ufz.de/index.php?de=4205
_______________________________________________ R-sig-ecology mailing list R-sig-ecology at r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-ecology -- Dr. Carsten F. Dormann Department of Computational Landscape Ecology Helmholtz Centre for Environmental Research-UFZ Permoserstr. 15 04318 Leipzig Germany Tel: ++49(0)341 2351946 Fax: ++49(0)341 2351939 Email: carsten.dormann at ufz.de internet: http://www.ufz.de/index.php?de=4205
Etienne > Lalibert? ================================ School of Forestry University of > Canterbury Private Bag 4800 Christchurch 8140, New Zealand Phone: +64 3 366 > 7001 ext. 8365 Fax: +64 3 364 > 2124 www.elaliberte.info _______________________________________________ R-si > g-ecology mailing > list R-sig-ecology at r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-e > cology