Testing difference between diversity indices with vegan::oecosimu

An embedded and charset-unspecified text was scrubbed...
Name: not available
URL: <https://stat.ethz.ch/pipermail/r-sig-ecology/attachments/20120425/c9a93204/attachment.pl>
Why not try some type of ANOVA style glm?

Chris Howden
Founding Partner
Tricky Solutions
Tricky Solutions 4 Tricky Problems
Evidence Based Strategic Development, IP Commercialisation and
Innovation, Data Analysis, Modelling and Training

(mobile) 0410 689 945
(fax / office)
chris at trickysolutions.com.au

Disclaimer: The information in this email and any attachments to it are
confidential and may contain legally privileged information. If you are not
the named or intended recipient, please delete this communication and
contact us immediately. Please note you are not authorised to copy,
use or disclose this communication or any attachments without our
consent. Although this email has been checked by anti-virus software,
there is a risk that email messages may be corrupted or infected by
viruses or other
interferences. No responsibility is accepted for such interference. Unless
expressly stated, the views of the writer are not those of the
company. Tricky Solutions always does our best to provide accurate
forecasts and analyses based on the data supplied, however it is
possible that some important predictors were not included in the data
sent to us. Information provided by us should not be solely relied
upon when making decisions and clients should use their own judgement.

Hello all,

I'd like to test if total diversity differs between two communities. For
each community several samples were taken and abundances collapsed over
groups to compute total diversity for each group. I tried to use
vegan::oecosimu to test non-randomness of my statisitc (difference in
Simpson-Diversity indices of collapsed abundances) - however, I am not
quite sure if I oversee posssible pitfalls:

library(vegan)
data(dune)

# a grouping variable:
gr <- gl(2, nrow(dune)/2)

divdiff <- function(x) abs(diversity(colSums(x[gr == "1", ]), "simp") -
                          diversity(colSums(x[gr == "2", ]), "simp"))
# testing function:
divdiff(dune)

oecosimu(dune, divdiff, "r2dtable", nsimul = 1999)
# oecosimu with 1999 simulations
# simulation method r2dtable
# alternative hypothesis: true mean is not equal to the statistic
#           statistic        z     2.5%      50% 97.5% Pr(sim.)
# statistic   0.00275 -0.20996  0.00013  0.00280  0.01     0.98

   [[alternative HTML version deleted]]

_______________________________________________
R-sig-ecology mailing list
R-sig-ecology at r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
An embedded and charset-unspecified text was scrubbed...
Name: not available
URL: <https://stat.ethz.ch/pipermail/r-sig-ecology/attachments/20120426/a707d381/attachment.pl>
Standard Hypothesis statistical testing often starts with the null
hypothesis that 2 things are identical, or that 2 population means are
identical. The p value is then used to reject this null and accept the
alternative, that they are indeed different.

Practically we're actually asking if we have enough information to indeed
say they are different.

I do agree though that stopping there is a bit silly. If there is a
statistical difference then we next need to look at the effect size or in
other words the magnitude of the difference and decide if this is
ecologically meaningful.

Chris Howden B.Sc. (Hons) GStat.
Founding Partner
Evidence Based Strategic Development, IP Commercialisation and Innovation,
Data Analysis, Modelling and Training
(mobile) 0410 689 945
(fax) +612 4782 9023
chris at trickysolutions.com.au

Disclaimer: The information in this email and any attachments to it are
confidential and may contain legally privileged information.?If you are
not the named or intended recipient, please delete this communication and
contact us immediately.?Please note you are not authorised to copy, use or
disclose this communication or any attachments without our consent.
Although this email has been checked by anti-virus software, there is a
risk that email messages may be corrupted or infected by viruses or other
interferences. No responsibility is accepted for such interference. Unless
expressly stated, the views of the writer are not those of the company.
Tricky Solutions always does our best to provide accurate forecasts and
analyses based on the data supplied, however it is possible that some
important predictors were not included in the data sent to us. Information
provided by us should not be solely relied upon when making decisions and
clients should use their own judgement.

-----Original Message-----
From: r-sig-ecology-bounces at r-project.org
[mailto:r-sig-ecology-bounces at r-project.org] On Behalf Of David Valentim
Dias
Sent: Thursday, 26 April 2012 2:36 PM
To: r-sig-ecology at r-project.org
Subject: Re: [R-sig-eco] Testing difference between diversity indices with
vegan::oecosimu

Hello Cichini,

I cannot help with your code but seems like you have a silly hypothesis.
Think about it: Probability of two communities to be identical?
You need to restate it in some more useful way. We already know most
things are different but with what magnitude? Which factors are causing
these changes? How these changes matter from the environment and us?

2012/4/25 Chris Howden <chris at trickysolutions.com.au>
Why not try some type of ANOVA style glm?

Chris Howden
Founding Partner
Tricky Solutions
Tricky Solutions 4 Tricky Problems
Evidence Based Strategic Development, IP Commercialisation and
Innovation, Data Analysis, Modelling and Training

(mobile) 0410 689 945
(fax / office)
chris at trickysolutions.com.au

Disclaimer: The information in this email and any attachments to it
are confidential and may contain legally privileged information. If
you are not the named or intended recipient, please delete this
communication and contact us immediately. Please note you are not
authorised to copy, use or disclose this communication or any
attachments without our consent. Although this email has been checked
by anti-virus software, there is a risk that email messages may be
corrupted or infected by viruses or other interferences. No
responsibility is accepted for such interference. Unless expressly
stated, the views of the writer are not those of the company. Tricky
Solutions always does our best to provide accurate forecasts and
analyses based on the data supplied, however it is possible that some
important predictors were not included in the data sent to us.
Information provided by us should not be solely relied upon when
making decisions and clients should use their own judgement.

On 26/04/2012, at 7:19, Kay Cichini <kay.cichini at gmail.com> wrote:

Hello all,

I'd like to test if total diversity differs between two communities.
For each community several samples were taken and abundances
collapsed over groups to compute total diversity for each group. I
tried to use vegan::oecosimu to test non-randomness of my statisitc
(difference in Simpson-Diversity indices of collapsed abundances) -
however, I am not quite sure if I oversee posssible pitfalls:

library(vegan)
data(dune)

# a grouping variable:
gr <- gl(2, nrow(dune)/2)

divdiff <- function(x) abs(diversity(colSums(x[gr == "1", ]), "simp")
-
                          diversity(colSums(x[gr == "2", ]),
"simp")) # testing function:
divdiff(dune)

oecosimu(dune, divdiff, "r2dtable", nsimul = 1999) # oecosimu with
1999 simulations # simulation method r2dtable # alternative
hypothesis: true mean is not equal to the statistic
#           statistic        z     2.5%      50% 97.5% Pr(sim.)
# statistic   0.00275 -0.20996  0.00013  0.00280  0.01     0.98

   [[alternative HTML version deleted]]

_______________________________________________
R-sig-ecology mailing list
R-sig-ecology at r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology

_______________________________________________
R-sig-ecology mailing list
R-sig-ecology at r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology

--
Currmculo: http://lattes.cnpq.br/7541377569511492
Hello Cichini,

I cannot help with your code but seems like you have a silly hypothesis.
Think about it: Probability of two communities to be identical?
You need to restate it in some more useful way. We already know most things
are different but with what magnitude? Which factors are causing these
changes? How these changes matter from the environment and us?
Surely if we knew the two things were different there would be no need
to test if they were? Most statistics assumes a Null model as we can say
something specific about the magnitude of the difference (it is zero)
and we can then see if the observations are consistent with that model.

I agree that subsequent analysis is required to understand why there are
differences, but we still need a mechanism to say, given the data
collected and the error processes, are the diversities of these two
"samples" the same?

G
2012/4/25 Chris Howden <chris at trickysolutions.com.au>

Why not try some type of ANOVA style glm?

Chris Howden
Founding Partner
Tricky Solutions
Tricky Solutions 4 Tricky Problems
Evidence Based Strategic Development, IP Commercialisation and
Innovation, Data Analysis, Modelling and Training

(mobile) 0410 689 945
(fax / office)
chris at trickysolutions.com.au

Disclaimer: The information in this email and any attachments to it are
confidential and may contain legally privileged information. If you are not
the named or intended recipient, please delete this communication and
contact us immediately. Please note you are not authorised to copy,
use or disclose this communication or any attachments without our
consent. Although this email has been checked by anti-virus software,
there is a risk that email messages may be corrupted or infected by
viruses or other
interferences. No responsibility is accepted for such interference. Unless
expressly stated, the views of the writer are not those of the
company. Tricky Solutions always does our best to provide accurate
forecasts and analyses based on the data supplied, however it is
possible that some important predictors were not included in the data
sent to us. Information provided by us should not be solely relied
upon when making decisions and clients should use their own judgement.

On 26/04/2012, at 7:19, Kay Cichini <kay.cichini at gmail.com> wrote:

Hello all,

I'd like to test if total diversity differs between two communities. For
each community several samples were taken and abundances collapsed over
groups to compute total diversity for each group. I tried to use
vegan::oecosimu to test non-randomness of my statisitc (difference in
Simpson-Diversity indices of collapsed abundances) - however, I am not
quite sure if I oversee posssible pitfalls:

library(vegan)
data(dune)

# a grouping variable:
gr <- gl(2, nrow(dune)/2)

divdiff <- function(x) abs(diversity(colSums(x[gr == "1", ]), "simp") -
                          diversity(colSums(x[gr == "2", ]), "simp"))
# testing function:
divdiff(dune)

oecosimu(dune, divdiff, "r2dtable", nsimul = 1999)
# oecosimu with 1999 simulations
# simulation method r2dtable
# alternative hypothesis: true mean is not equal to the statistic
#           statistic        z     2.5%      50% 97.5% Pr(sim.)
# statistic   0.00275 -0.20996  0.00013  0.00280  0.01     0.98

   [[alternative HTML version deleted]]

_______________________________________________
R-sig-ecology mailing list
R-sig-ecology at r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology

_______________________________________________
R-sig-ecology mailing list
R-sig-ecology at r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology

_______________________________________________
R-sig-ecology mailing list
R-sig-ecology at r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology

%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
 Dr. Gavin Simpson             [t] +44 (0)20 7679 0522
 ECRC, UCL Geography,          [f] +44 (0)20 7679 0565
 Pearson Building,             [e] gavin.simpsonATNOSPAMucl.ac.uk
 Gower Street, London          [w] http://www.ucl.ac.uk/~ucfagls/
 UK. WC1E 6BT.                 [w] http://www.freshwaters.org.uk
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
Hello all,

I'd like to test if total diversity differs between two communities. For
each community several samples were taken and abundances collapsed over
groups to compute total diversity for each group. I tried to use
vegan::oecosimu to test non-randomness of my statisitc (difference in
Simpson-Diversity indices of collapsed abundances) - however, I am not
quite sure if I oversee posssible pitfalls:

library(vegan)
data(dune)

# a grouping variable:
gr <- gl(2, nrow(dune)/2)

divdiff <- function(x) abs(diversity(colSums(x[gr == "1", ]), "simp") -
? ? ? ? ? ? ? ? ? ? ? ? ? diversity(colSums(x[gr == "2", ]), "simp"))
# testing function:
divdiff(dune)

oecosimu(dune, divdiff, "r2dtable", nsimul = 1999)
# oecosimu with 1999 simulations
# simulation method r2dtable
# alternative hypothesis: true mean is not equal to the statistic
# ? ? ? ? ? statistic ? ? ? ?z ? ? 2.5% ? ? ?50% 97.5% Pr(sim.)
# statistic ? 0.00275 -0.20996 ?0.00013 ?0.00280 ?0.01 ? ? 0.98
Dear Kay,

I am not sure about any possible pitfalls with your approach, but I
have tested the same data using the randomisation functions of the
"rich" library, and found that neither the Simpson diversity nor the
simple species richness differ significantly among the defined groups.

Here are the results following your example:

library(rich)

# prepare data
one <- as.data.frame(dune[gr == "1", ])
two <- as.data.frame(dune[gr == "2", ])

data <- list(one, two)

# compare cumulative species richness
c2cv(com1=data[[1]],com2=data[[2]],nrandom=1999)
#$res
#
#cv1                  27.0000
#cv2                  28.0000
#cv1-cv2              -1.0000
#p                     0.4220 # N.S.
#quantile 0.025       -4.0000
#quantile 0.975        4.0000
#randomized cv1-cv2    0.0225
#nrandom            1999.0000

# compare the Simpson diversity
simp.one <- diversity(dune[gr == "1", ], "simp")
simp.two <- diversity(dune[gr == "2", ], "simp")
c2m(pop1=simp.one,pop2=simp.two,nrandom=1999,verbose=FALSE)
#done.
#$res
#
#mv1                 8.630e-01
#mv2                 8.773e-01
#mv1-mv2            -1.439e-02
#p                   2.440e-01 # N.S.
#quantile 0.025     -3.456e-02
#quantile 0.975      3.351e-02
#randomized mv1-mv2  3.899e-04
#nrandom             1.999e+03
#########################

The possible pitfalls might be hidden under the different results ;-)

Cheers,
Ivailo
UBUNTU: a person is a person through other persons.

Hello all,

I'd like to test if total diversity differs between two communities. For
each community several samples were taken and abundances collapsed over
groups to compute total diversity for each group. I tried to use
vegan::oecosimu to test non-randomness of my statisitc (difference in
Simpson-Diversity indices of collapsed abundances) - however, I am not
quite sure if I oversee posssible pitfalls:

library(vegan)
data(dune)

# a grouping variable:
gr <- gl(2, nrow(dune)/2)

divdiff <- function(x) abs(diversity(colSums(x[gr == "1", ]), "simp") -
                          diversity(colSums(x[gr == "2", ]), "simp"))
# testing function:
divdiff(dune)

oecosimu(dune, divdiff, "r2dtable", nsimul = 1999)
# oecosimu with 1999 simulations
# simulation method r2dtable
# alternative hypothesis: true mean is not equal to the statistic
#           statistic        z     2.5%      50% 97.5% Pr(sim.)
# statistic   0.00275 -0.20996  0.00013  0.00280  0.01     0.98

Kay,

I think that Gav's suggestion is the most natural one: permute your classification vector and compare your observed difference to the permutation values. Null models can be problematic, and you must very carefully think what kind of null model you need and what is the null hypothesis under each null model. Quantitative null models are even trickier. I see the following possible problems with your idea:

- You used "r2dtable" null model which fixes both row and column totals (but not frequencies). This means that for all simulations the overall gamma diversity is fixed: Simpson index is found from species totals, and these are fixed. When you also fix row totals, the generated null models can be too similar to each other, and this in turn gives too low P-values. I think that when analysing overall diversities from marginal sums, you should use a null model that allows those marginal sums to vary. This may not be possible with the release version of vegan, but the development version in R-Forge has a completely redesigned null model engine with several new quantitative null models and allows plugging in your own null models (which could even include permutation models). 

- If usual null models can be painful, the quantitative null models give you double trouble. One problem is that they produce too evenly distributed data. For "r2dtable" this holds in two ways: the method fixes marginal totals, but not marginal frequencies (= number of non-zero cells). Typically the number of zeros is much lower than in real data, and the variance of rows and columns is lower than in any real data. Moreover, the simulated samples are often much more similar to each other than real re-sampling of Nature. This is like using Poisson glm for abundance data: the data are regularly over-dispersed to Poisson, and therefore the P-values are too low. You have just the same danger with these null models: the simulation variation is too low, and therefore your P-values are too low.

- The "r2dtable" method requires that your data are individuals: they are individuals that are swapped between cells. You used Dutch Dune meadow data in your example. Technically this works, since the data are integers, but they are cover class values and not individual, and therefore the swapping of integer pieces of cover classes has no meaning. If you want to consider null models, you should again switch to R-Forge version of vegan (currently there at version 2.1-15) which allows some models that apply to data that is not made of individuals, and also some methods that can retain the original marginal variances of the data.

There are many things that you need to consider if you want to use null models. However, I think that permutation of classification vector saves a lot of trouble, and is more easily understood and communicated.

Cheers, Jari Oksanen
An embedded and charset-unspecified text was scrubbed...
Name: not available
URL: <https://stat.ethz.ch/pipermail/r-sig-ecology/attachments/20120426/04462e39/attachment.pl>
An embedded and charset-unspecified text was scrubbed...
Name: not available
URL: <https://stat.ethz.ch/pipermail/r-sig-ecology/attachments/20120430/314ab78d/attachment.pl>