Skip to content

which factor to nest?

5 messages · Ben Bolker, T. Avery, Kingsford Jones

#
Hi all,
Maybe an expert of this particular design could provide insights into a 
interesting question (or possibly just a derailed view). Possibly 
outside of the R world, but has to be sorted out before R code can be 
generated - which should be trivial...

- 7 beekeepers each with several hives
- some hives treated with antiG, others left as controls
- unbalanced design (not an equal number of treated or control sites 
among or within beekeepers)
- measured parasite numbers (average per hive)
Q: want to know if antiG reduces parasite load

The initial reaction (from a student) was to consider Beekeeper as a 
random factor (although it could be considered fixed), and nest 
Treatment (antiG or control) within Beekeeper. This design is intuitive 
as Beekeepers are 'groups' and hives are 'subgroups' to which treatments 
are applied. Upon some investigation, it appears that the model could be 
flipped i.e. consider Treatment as a fixed factor and nest Beekeeper 
within Treatment. In this latter case, each Beekeeper would be 
represented in each Treatment and a crossed design results i.e. not 
nested at all. Various texts appear to 'arbitrarily' designate factors 
in similar models (see Zar on drug/drugstore example).

a) What design is correct?
b) What am I missing in way of determining groups and the ultimate design?

thanks in advance,
trevor
biology department
acadia
#
My two cents:

  * a GLMM if parasite numbers are small enough to
have to deal with them as count data (e.g. lots of zeros).
Otherwise (if you're lucky, as GLMMs are harder) most
likely a lognormal -- log-transform data or log(1+x) if
there are some zeros, and treat as a LMM (nlme or lmer).

  * "Nesting" is more or less a red herring here, only
really has to do with multiple *random* factors (and
then more to do with the coding of the random factors
than with fundamental experimental design distinctions).

  * so: antiG vs control is fixed, Beekeeper is probably
best treated as random (7 units is enough to make a
random effect plausible: if you had only 2 or 3 you
would probably have to treat as a fixed effect to
make progress)

  * because unbalanced (and possibly GLMM), aov/sums
of squares approaches are probably not viable

  * fairly straightforward with nlme (something like
lme(logparasites ~ antiG, random = ~1|Beekeeper) or
lme4:

lmer(logparasites ~ antiG + (1|Beekeeper)) or
(for GLMM)

glmer(logparasites ~ antiG + (1|Beekeeper), family=poisson)

 * Two more things to watch out for:

   - lme (nlme package) will give you p-values, lmer (lme4 package)
will not
   - if you end up fitting a GLMM you should definitely
worry about/check for overdispersion

  Ben Bolker
tavery wrote:

  
    
#
Thanks Ben for a speedy response...
I agree that a GLMM is probably more prudent and will investigate that 
idea further. My guard is against making the analysis too complicated...

For interest and discussion: In the minutes between my post and the 
response I went 'way back' to consider an even simpler design (if it 
works with unbalanced data). In essence the beekeepers are block factors 
as the treatments were applied within these blocks to colonies at random 
and the beekeeper is not of any interest, just the parasite numbers of 
the colonies. In fundamental design terms, the randomized block design 
appears a viable option. However there are likely issues with the count 
data (that I will investigate as I am unaware of the data per se, but do 
have access to similar data that do, in fact, have lots of zeros). My 
'traditional'

thanks,
trevor
Ben Bolker wrote:
#
On Sun, Jan 25, 2009 at 6:43 PM, tavery <trevor.avery at acadiau.ca> wrote:
Hi Trevor,

Yes, it sounds as though you have a nice, simple RBD that can be
analyzed using the code Ben suggested.  The lack of balance shouldn't
be a problem as long as you use one of the mixed models functions
(lmer, lme, glmmPQL, etc) rather than aov.  The fact that you have
count data shouldn't be a problem, although if you have an excessive
number of zeros you might want to have a look at the non-CRAN package
glmmADMB.

hth,

Kingsford Jones
#
Kingsford Jones wrote:
Just a quick point: *estimation* should be fairly straightforward
(easiest with log-transformed data -> lmer, lme, harder with Poisson
data -> glmer, glmmML, glmmAK, hardest with negative
binomial/overdispersed data -> glmmADMB).  Be very careful with
glmmPQL, known to be biased with low (<5-10) average counts per
unit.  *Inference* is a can of worms: read all about it on the
r-sig-mixed-models mailing list archive ...

  You may want to forward further questions along these lines
to r-sig-mixed-models at r-project.org instead ...

  Ben Bolker