advice on joining replicates

Fri, Jan 22, 2010 3:17 AM

On Fri, 2010-01-22 at 11:40 +0100, romunov wrote:

You have samples. Your replicates /are/ samples. The problem comes when
you try to assess the fitted "model" using permutation tests or
parametric theory (if any theory applies). Your data are replicated; you
don't have as many independent samples as the naive permutation test or
theory would presume. The answer is to use a model or a permutation test
that can take account of the dependences between your observations.

In your case, you have 10 stations and 5 replicates within each station.
It is reasonable to assume that the 5 replicate samples within each
station are correlated with one another. So in a permutation test of a
CCA, for example, on these data, we should not permute all the data
freely (i.e exchange samples between stations) because under the null
hypothesis, for your data, the samples aren't freely exchangeable (i.e
unrelated).

What we can do is condition the permutations on Station, so that we can
freely permute the samples with the stations but not exchange samples
between stations. This would be appropriate if you were testing the
effect of a covariate measured at the replicate (sample) level. This is
TEST1.

If you want to test the effect of a variable at the station level, then
we keep the replicates within stations fixed (i.e. we don't permute
those), but we do shuffle the stations. This is only possible if you
have equal numbers of replicates (samples) within stations. This is
TEST2

At the moment in package vegan, we allow for freely exchangeable (which
would be wrong for your data) permutations and for free permutation
within the levels of 'strata' for a CCA/RDA or related model. If you are
unsure, check if the function takes a 'strata' argument for the
permutations. The latter would be suitable for you in the case of TEST1
above. Currently, we don't support TEST2, although we are working on it
and are quite close now to having all the code in place that we need to
do some quite complex permutation designs, along the lines of those
available in Canoco.

So, if TEST1 is applicable to your science question (testing the effect
of a variable at the sample level, i.e. something that is not constant
at the station level), then we could do the following. I use Y as a
species matrix, and x as my explanatory variable (in data frame foo).
Station is a factor variable indicating which station each sample
belongs to. I further assume (when generating Station) that the samples
in Y and X are in the same order and that they are in Station order. If
they aren't in Station order, then you have to generate the Station
factor some other way:

Station <- gl(10,5, labels = paste("Station", 1:10))
mod <- cca(Y ~ x, data = foo)
permutest(mod, strata = Station)

If you need TEST2 or your samples within Stations are not freely
exchangeable (i.e. the replicates form a time series within each station
or are from a transect within the station) then currently you can't do
this in vegan without hacking the code, but this will be available in
the near future.

HTH

G

Cheers,
Roman



On Fri, Jan 22, 2010 at 10:12 AM, Gavin Simpson
<gavin.simpson at ucl.ac.uk> wrote:
        
        On Thu, 2010-01-21 at 22:30 +0100, Roman Lu?trik wrote:

        > Dear List,
        >
        > I realize my question may not be as R related as expected,

        but hopefully

        > subscribers will overlook this and hopefully offer some

        advice.

        > I have 10 stations with each station having 5 replicates (50

        samples in

        > total). I would like to join the replicates into one sample

        for further

        > analysis. Unfortunately I can't find how other studies

        joined replicates

        > (most paper I've read only note how they grouped species).

        Due to lack of

        > references, I joined samples around their median. This

        shrunk my initial

        > dataset of 83 species to 29. If I follow the advice by

        Warwick and Clarke

        > (PRIMER-E manual) and remove species contributing less than

        1 or 2% to total

        > abundance, I end up with 14 and 11 species, respectively. Is

        there a better

        > way of grouping replicates than by median?
        >
        > Cheers,
        > Roman

        
        
        Why do you want to throw away data? Can't you account for the
        clustering
        in your data within the "further analysis" you are going to
        undertake?
        
        Depending on your research question or how you view ecosystems
        etc,
        throwing away the rare taxa may not be the best idea either...
        
        Perhaps if you explain what it is that you will do in "further
        analysis"
        we can provide options?
        
        G
        
        --
        %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~
        %~%~%~%
         Dr. Gavin Simpson             [t] +44 (0)20 7679 0522
         ECRC, UCL Geography,          [f] +44 (0)20 7679 0565
         Pearson Building,             [e]
        gavin.simpsonATNOSPAMucl.ac.uk
         Gower Street, London          [w]
        http://www.ucl.ac.uk/~ucfagls/
         UK. WC1E 6BT.                 [w]
        http://www.freshwaters.org.uk
        %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~
        %~%~%~%

%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
 Dr. Gavin Simpson             [t] +44 (0)20 7679 0522
 ECRC, UCL Geography,          [f] +44 (0)20 7679 0565
 Pearson Building,             [e] gavin.simpsonATNOSPAMucl.ac.uk
 Gower Street, London          [w] http://www.ucl.ac.uk/~ucfagls/
 UK. WC1E 6BT.                 [w] http://www.freshwaters.org.uk
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%

advice on joining replicates

Thread (4 messages)