Hi all, Maybe an expert of this particular design could provide insights into a interesting question (or possibly just a derailed view). Possibly outside of the R world, but has to be sorted out before R code can be generated - which should be trivial... - 7 beekeepers each with several hives - some hives treated with antiG, others left as controls - unbalanced design (not an equal number of treated or control sites among or within beekeepers) - measured parasite numbers (average per hive) Q: want to know if antiG reduces parasite load The initial reaction (from a student) was to consider Beekeeper as a random factor (although it could be considered fixed), and nest Treatment (antiG or control) within Beekeeper. This design is intuitive as Beekeepers are 'groups' and hives are 'subgroups' to which treatments are applied. Upon some investigation, it appears that the model could be flipped i.e. consider Treatment as a fixed factor and nest Beekeeper within Treatment. In this latter case, each Beekeeper would be represented in each Treatment and a crossed design results i.e. not nested at all. Various texts appear to 'arbitrarily' designate factors in similar models (see Zar on drug/drugstore example). a) What design is correct? b) What am I missing in way of determining groups and the ultimate design? thanks in advance, trevor biology department acadia
which factor to nest?
5 messages · Ben Bolker, T. Avery, Kingsford Jones
My two cents: * a GLMM if parasite numbers are small enough to have to deal with them as count data (e.g. lots of zeros). Otherwise (if you're lucky, as GLMMs are harder) most likely a lognormal -- log-transform data or log(1+x) if there are some zeros, and treat as a LMM (nlme or lmer). * "Nesting" is more or less a red herring here, only really has to do with multiple *random* factors (and then more to do with the coding of the random factors than with fundamental experimental design distinctions). * so: antiG vs control is fixed, Beekeeper is probably best treated as random (7 units is enough to make a random effect plausible: if you had only 2 or 3 you would probably have to treat as a fixed effect to make progress) * because unbalanced (and possibly GLMM), aov/sums of squares approaches are probably not viable * fairly straightforward with nlme (something like lme(logparasites ~ antiG, random = ~1|Beekeeper) or lme4: lmer(logparasites ~ antiG + (1|Beekeeper)) or (for GLMM) glmer(logparasites ~ antiG + (1|Beekeeper), family=poisson) * Two more things to watch out for: - lme (nlme package) will give you p-values, lmer (lme4 package) will not - if you end up fitting a GLMM you should definitely worry about/check for overdispersion Ben Bolker
tavery wrote:
Hi all, Maybe an expert of this particular design could provide insights into a interesting question (or possibly just a derailed view). Possibly outside of the R world, but has to be sorted out before R code can be generated - which should be trivial... - 7 beekeepers each with several hives - some hives treated with antiG, others left as controls - unbalanced design (not an equal number of treated or control sites among or within beekeepers) - measured parasite numbers (average per hive) Q: want to know if antiG reduces parasite load The initial reaction (from a student) was to consider Beekeeper as a random factor (although it could be considered fixed), and nest Treatment (antiG or control) within Beekeeper. This design is intuitive as Beekeepers are 'groups' and hives are 'subgroups' to which treatments are applied. Upon some investigation, it appears that the model could be flipped i.e. consider Treatment as a fixed factor and nest Beekeeper within Treatment. In this latter case, each Beekeeper would be represented in each Treatment and a crossed design results i.e. not nested at all. Various texts appear to 'arbitrarily' designate factors in similar models (see Zar on drug/drugstore example). a) What design is correct? b) What am I missing in way of determining groups and the ultimate design? thanks in advance, trevor biology department acadia
_______________________________________________ R-sig-ecology mailing list R-sig-ecology at r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
Ben Bolker Associate professor, Biology Dep't, Univ. of Florida bolker at ufl.edu / www.zoology.ufl.edu/bolker GPG key: www.zoology.ufl.edu/bolker/benbolker-publickey.asc
Thanks Ben for a speedy response... I agree that a GLMM is probably more prudent and will investigate that idea further. My guard is against making the analysis too complicated... For interest and discussion: In the minutes between my post and the response I went 'way back' to consider an even simpler design (if it works with unbalanced data). In essence the beekeepers are block factors as the treatments were applied within these blocks to colonies at random and the beekeeper is not of any interest, just the parasite numbers of the colonies. In fundamental design terms, the randomized block design appears a viable option. However there are likely issues with the count data (that I will investigate as I am unaware of the data per se, but do have access to similar data that do, in fact, have lots of zeros). My 'traditional' thanks, trevor
Ben Bolker wrote:
My two cents: * a GLMM if parasite numbers are small enough to have to deal with them as count data (e.g. lots of zeros). Otherwise (if you're lucky, as GLMMs are harder) most likely a lognormal -- log-transform data or log(1+x) if there are some zeros, and treat as a LMM (nlme or lmer). * "Nesting" is more or less a red herring here, only really has to do with multiple *random* factors (and then more to do with the coding of the random factors than with fundamental experimental design distinctions). * so: antiG vs control is fixed, Beekeeper is probably best treated as random (7 units is enough to make a random effect plausible: if you had only 2 or 3 you would probably have to treat as a fixed effect to make progress) * because unbalanced (and possibly GLMM), aov/sums of squares approaches are probably not viable * fairly straightforward with nlme (something like lme(logparasites ~ antiG, random = ~1|Beekeeper) or lme4: lmer(logparasites ~ antiG + (1|Beekeeper)) or (for GLMM) glmer(logparasites ~ antiG + (1|Beekeeper), family=poisson) * Two more things to watch out for: - lme (nlme package) will give you p-values, lmer (lme4 package) will not - if you end up fitting a GLMM you should definitely worry about/check for overdispersion Ben Bolker tavery wrote:
Hi all, Maybe an expert of this particular design could provide insights into a interesting question (or possibly just a derailed view). Possibly outside of the R world, but has to be sorted out before R code can be generated - which should be trivial... - 7 beekeepers each with several hives - some hives treated with antiG, others left as controls - unbalanced design (not an equal number of treated or control sites among or within beekeepers) - measured parasite numbers (average per hive) Q: want to know if antiG reduces parasite load The initial reaction (from a student) was to consider Beekeeper as a random factor (although it could be considered fixed), and nest Treatment (antiG or control) within Beekeeper. This design is intuitive as Beekeepers are 'groups' and hives are 'subgroups' to which treatments are applied. Upon some investigation, it appears that the model could be flipped i.e. consider Treatment as a fixed factor and nest Beekeeper within Treatment. In this latter case, each Beekeeper would be represented in each Treatment and a crossed design results i.e. not nested at all. Various texts appear to 'arbitrarily' designate factors in similar models (see Zar on drug/drugstore example). a) What design is correct? b) What am I missing in way of determining groups and the ultimate design? thanks in advance, trevor biology department acadia
_______________________________________________ R-sig-ecology mailing list R-sig-ecology at r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
On Sun, Jan 25, 2009 at 6:43 PM, tavery <trevor.avery at acadiau.ca> wrote:
Thanks Ben for a speedy response... I agree that a GLMM is probably more prudent and will investigate that idea further. My guard is against making the analysis too complicated... For interest and discussion: In the minutes between my post and the response I went 'way back' to consider an even simpler design (if it works with unbalanced data). In essence the beekeepers are block factors as the treatments were applied within these blocks to colonies at random and the beekeeper is not of any interest, just the parasite numbers of the colonies. In fundamental design terms, the randomized block design appears a viable option. However there are likely issues with the count data (that I will investigate as I am unaware of the data per se, but do have access to similar data that do, in fact, have lots of zeros).
Hi Trevor, Yes, it sounds as though you have a nice, simple RBD that can be analyzed using the code Ben suggested. The lack of balance shouldn't be a problem as long as you use one of the mixed models functions (lmer, lme, glmmPQL, etc) rather than aov. The fact that you have count data shouldn't be a problem, although if you have an excessive number of zeros you might want to have a look at the non-CRAN package glmmADMB. hth, Kingsford Jones
thanks, trevor Ben Bolker wrote:
My two cents: * a GLMM if parasite numbers are small enough to have to deal with them as count data (e.g. lots of zeros). Otherwise (if you're lucky, as GLMMs are harder) most likely a lognormal -- log-transform data or log(1+x) if there are some zeros, and treat as a LMM (nlme or lmer). * "Nesting" is more or less a red herring here, only really has to do with multiple *random* factors (and then more to do with the coding of the random factors than with fundamental experimental design distinctions). * so: antiG vs control is fixed, Beekeeper is probably best treated as random (7 units is enough to make a random effect plausible: if you had only 2 or 3 you would probably have to treat as a fixed effect to make progress) * because unbalanced (and possibly GLMM), aov/sums of squares approaches are probably not viable * fairly straightforward with nlme (something like lme(logparasites ~ antiG, random = ~1|Beekeeper) or lme4: lmer(logparasites ~ antiG + (1|Beekeeper)) or (for GLMM) glmer(logparasites ~ antiG + (1|Beekeeper), family=poisson) * Two more things to watch out for: - lme (nlme package) will give you p-values, lmer (lme4 package) will not - if you end up fitting a GLMM you should definitely worry about/check for overdispersion Ben Bolker tavery wrote:
Hi all, Maybe an expert of this particular design could provide insights into a interesting question (or possibly just a derailed view). Possibly outside of the R world, but has to be sorted out before R code can be generated - which should be trivial... - 7 beekeepers each with several hives - some hives treated with antiG, others left as controls - unbalanced design (not an equal number of treated or control sites among or within beekeepers) - measured parasite numbers (average per hive) Q: want to know if antiG reduces parasite load The initial reaction (from a student) was to consider Beekeeper as a random factor (although it could be considered fixed), and nest Treatment (antiG or control) within Beekeeper. This design is intuitive as Beekeepers are 'groups' and hives are 'subgroups' to which treatments are applied. Upon some investigation, it appears that the model could be flipped i.e. consider Treatment as a fixed factor and nest Beekeeper within Treatment. In this latter case, each Beekeeper would be represented in each Treatment and a crossed design results i.e. not nested at all. Various texts appear to 'arbitrarily' designate factors in similar models (see Zar on drug/drugstore example). a) What design is correct? b) What am I missing in way of determining groups and the ultimate design? thanks in advance, trevor biology department acadia
_______________________________________________ R-sig-ecology mailing list R-sig-ecology at r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
_______________________________________________ R-sig-ecology mailing list R-sig-ecology at r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
Kingsford Jones wrote:
On Sun, Jan 25, 2009 at 6:43 PM, tavery <trevor.avery at acadiau.ca> wrote:
Thanks Ben for a speedy response... I agree that a GLMM is probably more prudent and will investigate that idea further. My guard is against making the analysis too complicated... For interest and discussion: In the minutes between my post and the response I went 'way back' to consider an even simpler design (if it works with unbalanced data). In essence the beekeepers are block factors as the treatments were applied within these blocks to colonies at random and the beekeeper is not of any interest, just the parasite numbers of the colonies. In fundamental design terms, the randomized block design appears a viable option. However there are likely issues with the count data (that I will investigate as I am unaware of the data per se, but do have access to similar data that do, in fact, have lots of zeros).
Hi Trevor, Yes, it sounds as though you have a nice, simple RBD that can be analyzed using the code Ben suggested. The lack of balance shouldn't be a problem as long as you use one of the mixed models functions (lmer, lme, glmmPQL, etc) rather than aov. The fact that you have count data shouldn't be a problem, although if you have an excessive number of zeros you might want to have a look at the non-CRAN package glmmADMB. hth, Kingsford Jones
Just a quick point: *estimation* should be fairly straightforward (easiest with log-transformed data -> lmer, lme, harder with Poisson data -> glmer, glmmML, glmmAK, hardest with negative binomial/overdispersed data -> glmmADMB). Be very careful with glmmPQL, known to be biased with low (<5-10) average counts per unit. *Inference* is a can of worms: read all about it on the r-sig-mixed-models mailing list archive ... You may want to forward further questions along these lines to r-sig-mixed-models at r-project.org instead ... Ben Bolker
thanks, trevor Ben Bolker wrote:
My two cents: * a GLMM if parasite numbers are small enough to have to deal with them as count data (e.g. lots of zeros). Otherwise (if you're lucky, as GLMMs are harder) most likely a lognormal -- log-transform data or log(1+x) if there are some zeros, and treat as a LMM (nlme or lmer). * "Nesting" is more or less a red herring here, only really has to do with multiple *random* factors (and then more to do with the coding of the random factors than with fundamental experimental design distinctions). * so: antiG vs control is fixed, Beekeeper is probably best treated as random (7 units is enough to make a random effect plausible: if you had only 2 or 3 you would probably have to treat as a fixed effect to make progress) * because unbalanced (and possibly GLMM), aov/sums of squares approaches are probably not viable * fairly straightforward with nlme (something like lme(logparasites ~ antiG, random = ~1|Beekeeper) or lme4: lmer(logparasites ~ antiG + (1|Beekeeper)) or (for GLMM) glmer(logparasites ~ antiG + (1|Beekeeper), family=poisson) * Two more things to watch out for: - lme (nlme package) will give you p-values, lmer (lme4 package) will not - if you end up fitting a GLMM you should definitely worry about/check for overdispersion Ben Bolker tavery wrote:
Hi all, Maybe an expert of this particular design could provide insights into a interesting question (or possibly just a derailed view). Possibly outside of the R world, but has to be sorted out before R code can be generated - which should be trivial... - 7 beekeepers each with several hives - some hives treated with antiG, others left as controls - unbalanced design (not an equal number of treated or control sites among or within beekeepers) - measured parasite numbers (average per hive) Q: want to know if antiG reduces parasite load The initial reaction (from a student) was to consider Beekeeper as a random factor (although it could be considered fixed), and nest Treatment (antiG or control) within Beekeeper. This design is intuitive as Beekeepers are 'groups' and hives are 'subgroups' to which treatments are applied. Upon some investigation, it appears that the model could be flipped i.e. consider Treatment as a fixed factor and nest Beekeeper within Treatment. In this latter case, each Beekeeper would be represented in each Treatment and a crossed design results i.e. not nested at all. Various texts appear to 'arbitrarily' designate factors in similar models (see Zar on drug/drugstore example). a) What design is correct? b) What am I missing in way of determining groups and the ultimate design? thanks in advance, trevor biology department acadia
Ben Bolker Associate professor, Biology Dep't, Univ. of Florida bolker at ufl.edu / www.zoology.ufl.edu/bolker GPG key: www.zoology.ufl.edu/bolker/benbolker-publickey.asc