An embedded and charset-unspecified text was scrubbed... Name: not available URL: <https://stat.ethz.ch/pipermail/r-sig-mixed-models/attachments/20090126/03ae768f/attachment.pl>
Should blocking factors be modeled as random effects?
5 messages · Prew, Paul, Juan Pedro Steibel, John Maindonald +1 more
Hello, Treating a block effect as random allows recovering inter-block information. In a complete randomized block design (CRBD), treating blocks as fixed or random should yield identical results. In an incomplete block design (incomplete by design or by missing at random some observations), the results will differ. If the Gaussian assumption regarding the block effects are sound, I would expect that treating the block as random will be more efficient that fitting block as fixed. Moreover, one could compute the relative efficiency of both analyses by comparing the variances of a particular treatment difference when block is treated as fixed versus when it is treated as a random effect. The catch is that the relative efficiency depends on the actual variance ratios (unknown) and on the assumptions regarding the random effects (commonly, Gaussian distribution). In practice, when analyzing field or lab experiments, I tend to specify the block as a random effect. Always. In some cases there are very few levels, though. In those cases, if someone asks "how can you reliably estimate a variance component for a (blocking) factor with only (say) 4 or 5 levels?", I just shrug. 8^D JP
Prew, Paul wrote:
I have been following your R discussion list on mixed modeling for a few
weeks, in hopes of understanding mixed modeling better. And it has
helped. I was not aware of the controversy surrounding degrees of
freedom and the distribution of test statistics. I have just been
trusting the ANOVA output from software (Minitab, JMP) that reported F
tests. JMP uses Kenward-Roger, Minitab's ANOVA reports an F-statistic,
followed by "F-test not exact for this term".
A recent mention by Douglas Bates of George Box, though, hit upon an
aspect of mixed models that has confused me. I'm an industrial
statistician, and studied statistics at Iowa State and the University of
Minnesota. I have had 3 courses in DOE, 2 at the graduate level, and
none of them mentioned blocking factors could (should?) be modeled as
random effects. **Exception: the whole plots in a split plot design
were taught as random effects.**
The 2005 update to Box Hunter Hunter discusses blocking as does Wu &
Hamada (2000). Both texts model blocking factors such as Days and
Batches as fixed effects. Montgomery's DOE text, 2009 rev., pretty
consistently states that blocks can be either random or fixed. Don't
have a consensus from that small sample.
I'm trying to understand the implications if I consistently used random
effects for DOE analysis.
I'm quite willing to use R for mixed models, seeing as Minitab, JMP etc.
appear to use degrees of freedom calculations that are questionable.
But as Douglas points out --- Box said, "all models are wrong, some are
useful" => Box's latest text doesn't bother with random effects for DOE
=> does it follow that for practical purposes it's OK to consider
blocks as fixed? There are certainly several advantages to keeping it
simple (i.e. fixed only):
* The analyses we (my statistics group) provide to our chemists and
engineers are more easily understood
* The 2-day short courses we teach in DOE to these same coworkers
couldn't realistically get across the idea of mixed model analysis ---
they would become less self-sufficient, where we're trying to make them
more self-sufficient
* We have a handful of softwares (Minitab, JMP, Design Expert) that can
perform DOE and augment the results in a number of ways:
*** fold-over the design to resolve aliasing in fractional designs
*** add axial runs to enable Response Surface methods
*** add distributions to the input factors, enabling
Robustness/Sensitivity analyses
*** running optimization algorithms to suggest the factor settings
that simultaneously consider multiple objectives
***** Not to mention the loss of Sample Size Calculations, far and
away my most frequent request
None of these softwares recognize random factors to perform these
augmentations
Replacing this functionality with R is going to be a high learning
curve, and probably not entirely possible. My coding skills in R
consist of cutting and pasting what others have done.
I don't really expect that there's a "right" answer to the question of
random effects in DOE. But I do believe that beyond the loss of
p-values, there are other ramifications for advising experimenters,
'"You can't trust results from your blocking on Days (or Shifts or RM
Lots or Batches, etc) unless they are modeled as random effects."
There's statistical significance, and practical significance. My hope
is that blocks while random effects are statistically "truer", their
marginal worth over fixed effects in DOE is ignorable. Again, I don't
want this to come across as shooting the messenger, you are only laying
out the current state of art and the work that remains to be done. But
any insight you can provide into what's practical right now would be
highly interesting.
Thank you for your time and consideration,
Paul Prew
651-795-5942 fax 651-204-7504
Ecolab Research Center
Mail Stop ESC-F4412
Lone Oak Drive
Eagan, MN 55121-1560
CONFIDENTIALITY NOTICE: \ This e-mail communication an...{{dropped:11}}
_______________________________________________ R-sig-mixed-models at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
============================= Juan Pedro Steibel Assistant Professor Statistical Genetics and Genomics Department of Animal Science & Department of Fisheries and Wildlife Michigan State University 1205-I Anthony Hall East Lansing, MI 48824 USA Phone: 1-517-353-5102 E-mail: steibelj at msu.edu
4 days later
"In a complete randomized block design (CRBD), treating blocks as fixed or random should yield identical results." It depends what you mean by "results". SEs of effects, for treatments that are estimated "within blocks", will be the same. The between block variance does not contribute to this SE. Estimates of SEs of treatment means may be very different. The between block variance does contribute to this SE. This is where it does matter if there are very few blocks. The SE will be estimated with very poor accuracy (low df). Of course, the SEs of effects assume that there is no systematic change in treatment effect from one block to another. Unless there are super-blocks (sites?), there is no way to estimate the SE of any block-treatment interaction. Look at the kiwishade data in the DAAG package for an example where there might well be differences between blocks that are affected by the direction in which the blocks face. John Maindonald email: john.maindonald at anu.edu.au phone : +61 2 (6125)3473 fax : +61 2(6125)5549 Centre for Mathematics & Its Applications, Room 1194, John Dedman Mathematical Sciences Building (Building 27) Australian National University, Canberra ACT 0200.
On 27/01/2009, at 8:51 AM, Juan Pedro Steibel wrote:
Hello, Treating a block effect as random allows recovering inter-block information. In a complete randomized block design (CRBD), treating blocks as fixed or random should yield identical results. In an incomplete block design (incomplete by design or by missing at random some observations), the results will differ. If the Gaussian assumption regarding the block effects are sound, I would expect that treating the block as random will be more efficient that fitting block as fixed. Moreover, one could compute the relative efficiency of both analyses by comparing the variances of a particular treatment difference when block is treated as fixed versus when it is treated as a random effect. The catch is that the relative efficiency depends on the actual variance ratios (unknown) and on the assumptions regarding the random effects (commonly, Gaussian distribution). In practice, when analyzing field or lab experiments, I tend to specify the block as a random effect. Always. In some cases there are very few levels, though. In those cases, if someone asks "how can you reliably estimate a variance component for a (blocking) factor with only (say) 4 or 5 levels?", I just shrug. 8^D JP Prew, Paul wrote:
I have been following your R discussion list on mixed modeling for
a few
weeks, in hopes of understanding mixed modeling better. And it has
helped. I was not aware of the controversy surrounding degrees of
freedom and the distribution of test statistics. I have just been
trusting the ANOVA output from software (Minitab, JMP) that
reported F
tests. JMP uses Kenward-Roger, Minitab's ANOVA reports an F-
statistic,
followed by "F-test not exact for this term".
A recent mention by Douglas Bates of George Box, though, hit upon an
aspect of mixed models that has confused me. I'm an industrial
statistician, and studied statistics at Iowa State and the
University of
Minnesota. I have had 3 courses in DOE, 2 at the graduate level, and
none of them mentioned blocking factors could (should?) be modeled as
random effects. **Exception: the whole plots in a split plot design
were taught as random effects.**
The 2005 update to Box Hunter Hunter discusses blocking as does Wu &
Hamada (2000). Both texts model blocking factors such as Days and
Batches as fixed effects. Montgomery's DOE text, 2009 rev., pretty
consistently states that blocks can be either random or fixed. Don't
have a consensus from that small sample.
I'm trying to understand the implications if I consistently used
random
effects for DOE analysis.
I'm quite willing to use R for mixed models, seeing as Minitab, JMP
etc.
appear to use degrees of freedom calculations that are questionable.
But as Douglas points out --- Box said, "all models are wrong, some
are
useful" => Box's latest text doesn't bother with random effects for
DOE
=> does it follow that for practical purposes it's OK to consider
blocks as fixed? There are certainly several advantages to keeping
it
simple (i.e. fixed only):
* The analyses we (my statistics group) provide to our chemists and
engineers are more easily understood
* The 2-day short courses we teach in DOE to these same coworkers
couldn't realistically get across the idea of mixed model analysis
---
they would become less self-sufficient, where we're trying to make
them
more self-sufficient
* We have a handful of softwares (Minitab, JMP, Design Expert) that
can
perform DOE and augment the results in a number of ways:
*** fold-over the design to resolve aliasing in fractional designs
*** add axial runs to enable Response Surface methods
*** add distributions to the input factors, enabling
Robustness/Sensitivity analyses
*** running optimization algorithms to suggest the factor settings
that simultaneously consider multiple objectives
***** Not to mention the loss of Sample Size Calculations, far and
away my most frequent request
None of these softwares recognize random factors to perform these
augmentations
Replacing this functionality with R is going to be a high learning
curve, and probably not entirely possible. My coding skills in R
consist of cutting and pasting what others have done.
I don't really expect that there's a "right" answer to the question
of
random effects in DOE. But I do believe that beyond the loss of
p-values, there are other ramifications for advising experimenters,
'"You can't trust results from your blocking on Days (or Shifts or RM
Lots or Batches, etc) unless they are modeled as random effects."
There's statistical significance, and practical significance. My
hope
is that blocks while random effects are statistically "truer", their
marginal worth over fixed effects in DOE is ignorable. Again, I don't
want this to come across as shooting the messenger, you are only
laying
out the current state of art and the work that remains to be done.
But
any insight you can provide into what's practical right now would be
highly interesting.
Thank you for your time and consideration,
Paul Prew
651-795-5942 fax 651-204-7504
Ecolab Research Center
Mail Stop ESC-F4412
Lone Oak Drive
Eagan, MN 55121-1560
CONFIDENTIALITY NOTICE: \ This e-mail communication an...{{dropped:
11}}
_______________________________________________ R-sig-mixed-models at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
-- ============================= Juan Pedro Steibel Assistant Professor Statistical Genetics and Genomics Department of Animal Science & Department of Fisheries and Wildlife Michigan State University 1205-I Anthony Hall East Lansing, MI 48824 USA Phone: 1-517-353-5102 E-mail: steibelj at msu.edu
_______________________________________________ R-sig-mixed-models at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
Thanks for the comment John, I should have written that better. I had in mind a very simple CRBD with one grouping factor (treatment) and complete blocks with only one plot per treatment. You are perfectly right, when there are between and within block (or plot) treatments (example: split-plot, strip-plot, split-block designs), the way to go is to consider the blocks and plots as random effects. I meant to say that in treating the block as random produced the same inferences (SE and all) only in (very) simple design, while in more complex designs, the random block effect leads to better inferences. That is the reason I treat block as random by default. Thanks again. JP
John Maindonald wrote:
"In a complete randomized block design (CRBD), treating blocks as fixed or random should yield identical results." It depends what you mean by "results". SEs of effects, for treatments that are estimated "within blocks", will be the same. The between block variance does not contribute to this SE. Estimates of SEs of treatment means may be very different. The between block variance does contribute to this SE. This is where it does matter if there are very few blocks. The SE will be estimated with very poor accuracy (low df). Of course, the SEs of effects assume that there is no systematic change in treatment effect from one block to another. Unless there are super-blocks (sites?), there is no way to estimate the SE of any block-treatment interaction. Look at the kiwishade data in the DAAG package for an example where there might well be differences between blocks that are affected by the direction in which the blocks face. John Maindonald email: john.maindonald at anu.edu.au phone : +61 2 (6125)3473 fax : +61 2(6125)5549 Centre for Mathematics & Its Applications, Room 1194, John Dedman Mathematical Sciences Building (Building 27) Australian National University, Canberra ACT 0200. On 27/01/2009, at 8:51 AM, Juan Pedro Steibel wrote:
Hello, Treating a block effect as random allows recovering inter-block information. In a complete randomized block design (CRBD), treating blocks as fixed or random should yield identical results. In an incomplete block design (incomplete by design or by missing at random some observations), the results will differ. If the Gaussian assumption regarding the block effects are sound, I would expect that treating the block as random will be more efficient that fitting block as fixed. Moreover, one could compute the relative efficiency of both analyses by comparing the variances of a particular treatment difference when block is treated as fixed versus when it is treated as a random effect. The catch is that the relative efficiency depends on the actual variance ratios (unknown) and on the assumptions regarding the random effects (commonly, Gaussian distribution). In practice, when analyzing field or lab experiments, I tend to specify the block as a random effect. Always. In some cases there are very few levels, though. In those cases, if someone asks "how can you reliably estimate a variance component for a (blocking) factor with only (say) 4 or 5 levels?", I just shrug. 8^D JP Prew, Paul wrote:
I have been following your R discussion list on mixed modeling for a
few
weeks, in hopes of understanding mixed modeling better. And it has
helped. I was not aware of the controversy surrounding degrees of
freedom and the distribution of test statistics. I have just been
trusting the ANOVA output from software (Minitab, JMP) that reported F
tests. JMP uses Kenward-Roger, Minitab's ANOVA reports an F-statistic,
followed by "F-test not exact for this term".
A recent mention by Douglas Bates of George Box, though, hit upon an
aspect of mixed models that has confused me. I'm an industrial
statistician, and studied statistics at Iowa State and the
University of
Minnesota. I have had 3 courses in DOE, 2 at the graduate level, and
none of them mentioned blocking factors could (should?) be modeled as
random effects. **Exception: the whole plots in a split plot design
were taught as random effects.**
The 2005 update to Box Hunter Hunter discusses blocking as does Wu &
Hamada (2000). Both texts model blocking factors such as Days and
Batches as fixed effects. Montgomery's DOE text, 2009 rev., pretty
consistently states that blocks can be either random or fixed. Don't
have a consensus from that small sample.
I'm trying to understand the implications if I consistently used random
effects for DOE analysis.
I'm quite willing to use R for mixed models, seeing as Minitab, JMP
etc.
appear to use degrees of freedom calculations that are questionable.
But as Douglas points out --- Box said, "all models are wrong, some are
useful" => Box's latest text doesn't bother with random effects for DOE
=> does it follow that for practical purposes it's OK to consider
blocks as fixed? There are certainly several advantages to keeping it
simple (i.e. fixed only):
* The analyses we (my statistics group) provide to our chemists and
engineers are more easily understood
* The 2-day short courses we teach in DOE to these same coworkers
couldn't realistically get across the idea of mixed model analysis ---
they would become less self-sufficient, where we're trying to make them
more self-sufficient
* We have a handful of softwares (Minitab, JMP, Design Expert) that can
perform DOE and augment the results in a number of ways:
*** fold-over the design to resolve aliasing in fractional designs
*** add axial runs to enable Response Surface methods
*** add distributions to the input factors, enabling
Robustness/Sensitivity analyses
*** running optimization algorithms to suggest the factor settings
that simultaneously consider multiple objectives
***** Not to mention the loss of Sample Size Calculations, far and
away my most frequent request
None of these softwares recognize random factors to perform these
augmentations
Replacing this functionality with R is going to be a high learning
curve, and probably not entirely possible. My coding skills in R
consist of cutting and pasting what others have done.
I don't really expect that there's a "right" answer to the question of
random effects in DOE. But I do believe that beyond the loss of
p-values, there are other ramifications for advising experimenters,
'"You can't trust results from your blocking on Days (or Shifts or RM
Lots or Batches, etc) unless they are modeled as random effects."
There's statistical significance, and practical significance. My hope
is that blocks while random effects are statistically "truer", their
marginal worth over fixed effects in DOE is ignorable. Again, I don't
want this to come across as shooting the messenger, you are only laying
out the current state of art and the work that remains to be done. But
any insight you can provide into what's practical right now would be
highly interesting.
Thank you for your time and consideration,
Paul Prew
651-795-5942 fax 651-204-7504
Ecolab Research Center
Mail Stop ESC-F4412
Lone Oak Drive
Eagan, MN 55121-1560
CONFIDENTIALITY NOTICE: \ This e-mail communication an...{{dropped:11}}
_______________________________________________ R-sig-mixed-models at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
-- ============================= Juan Pedro Steibel Assistant Professor Statistical Genetics and Genomics Department of Animal Science & Department of Fisheries and Wildlife Michigan State University 1205-I Anthony Hall East Lansing, MI 48824 USA Phone: 1-517-353-5102 E-mail: steibelj at msu.edu
_______________________________________________ R-sig-mixed-models at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
_______________________________________________ R-sig-mixed-models at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
============================= Juan Pedro Steibel Assistant Professor Statistical Genetics and Genomics Department of Animal Science & Department of Fisheries and Wildlife Michigan State University 1205-I Anthony Hall East Lansing, MI 48824 USA Phone: 1-517-353-5102 E-mail: steibelj at msu.edu
On Mon, Jan 26, 2009 at 2:42 PM, Prew, Paul <Paul.Prew at ecolab.com> wrote:
I have been following your R discussion list on mixed modeling for a few weeks, in hopes of understanding mixed modeling better. And it has helped. I was not aware of the controversy surrounding degrees of freedom and the distribution of test statistics. I have just been trusting the ANOVA output from software (Minitab, JMP) that reported F tests. JMP uses Kenward-Roger, Minitab's ANOVA reports an F-statistic, followed by "F-test not exact for this term". A recent mention by Douglas Bates of George Box, though, hit upon an aspect of mixed models that has confused me. I'm an industrial statistician, and studied statistics at Iowa State and the University of Minnesota. I have had 3 courses in DOE, 2 at the graduate level, and none of them mentioned blocking factors could (should?) be modeled as random effects. **Exception: the whole plots in a split plot design were taught as random effects.** The 2005 update to Box Hunter Hunter discusses blocking as does Wu & Hamada (2000). Both texts model blocking factors such as Days and Batches as fixed effects. Montgomery's DOE text, 2009 rev., pretty consistently states that blocks can be either random or fixed. Don't have a consensus from that small sample. I'm trying to understand the implications if I consistently used random effects for DOE analysis. I'm quite willing to use R for mixed models, seeing as Minitab, JMP etc. appear to use degrees of freedom calculations that are questionable. But as Douglas points out --- Box said, "all models are wrong, some are useful" => Box's latest text doesn't bother with random effects for DOE => does it follow that for practical purposes it's OK to consider blocks as fixed? There are certainly several advantages to keeping it simple (i.e. fixed only): * The analyses we (my statistics group) provide to our chemists and engineers are more easily understood * The 2-day short courses we teach in DOE to these same coworkers couldn't realistically get across the idea of mixed model analysis --- they would become less self-sufficient, where we're trying to make them more self-sufficient * We have a handful of softwares (Minitab, JMP, Design Expert) that can perform DOE and augment the results in a number of ways: *** fold-over the design to resolve aliasing in fractional designs *** add axial runs to enable Response Surface methods *** add distributions to the input factors, enabling Robustness/Sensitivity analyses *** running optimization algorithms to suggest the factor settings that simultaneously consider multiple objectives ***** Not to mention the loss of Sample Size Calculations, far and away my most frequent request None of these softwares recognize random factors to perform these augmentations Replacing this functionality with R is going to be a high learning curve, and probably not entirely possible. My coding skills in R consist of cutting and pasting what others have done. I don't really expect that there's a "right" answer to the question of random effects in DOE. But I do believe that beyond the loss of p-values, there are other ramifications for advising experimenters, '"You can't trust results from your blocking on Days (or Shifts or RM Lots or Batches, etc) unless they are modeled as random effects." There's statistical significance, and practical significance. My hope is that blocks while random effects are statistically "truer", their marginal worth over fixed effects in DOE is ignorable. Again, I don't want this to come across as shooting the messenger, you are only laying out the current state of art and the work that remains to be done. But any insight you can provide into what's practical right now would be highly interesting.
Thanks for bringing up the topic, Paul. As you and I know, you originally sent your question to me and I encouraged you to send it to this list. As I wrote in my initial response to you, "My off-the-cuff reaction is that in these situations the effects of blocking factors are regarded as nuisance parameters whereas in many mixed-model situations the variances and sometimes the values of the random effects are themselves of interest. When the effects are nuisance parameters the simplest approach is to model them as fixed effects." On thinking about it more, I can imagine several different approaches to this question. If you just ask, "Are the levels of this blocking factor a fixed set of levels or a random selection from a population of possible levels?" then in most cases I imagine you would say they are a random selection and should be modeled using random effects. This would especially be true of what Taguchi called "environmental factors" which, by definition, are not under the control of the experimenter. If you say that blocking factors are not of interest per se and that your purpose is simply to control for them, it is simpler to model them as fixed effects. There are two aspects to "simpler": computationally simpler and conceptually simpler. Of these I think that conceptually is more important. The computational burden for fitting a mixed model versus a fixed-effects model is really a software problem, not a hardware problem. Commercial statistical software like Minitab or JMP with a simple, convenient interface has limited flexibility, in part because it is designed to have a simple, convenient interface - the "what you see is all you get" problem. (I googled that phrase and got a laugh from the article at www.computer-dictionary-online.org which referred to "point-and-drool interfaces".) The actual calculations involved in fitting mixed models are not that formidable but designing the interface can be. (One of the underappreciated aspects of the model-fitting software in R, and in the S language in general, is the structure of the model.frame, model.matrix sequence for transforming a formula into a numerical representation. This makes designing an interface much. much easier as long as you count on the user to input a formula.) Conceptually fixed-effects models are simpler than mixed models but they may over-simplify the analysis. If your purpose is estimation of fixed-effects parameters, including assessing precision of the estimates, then you need to ask if you want to estimate those parameters conditional the particular levels of the blocking factor that you observed or with respect to the possible values of the blocking factors that are represented by the sample you observed. If you are willing to condition on the particular levels you observed then use fixed-effects for the blocking factor. For all possible levels of the blocking factor you could use random effects. For a designed experiment the estimates of the fixed effects will probably not be affected much by using random effects for the blocking factor instead of fixed effects. However the precision of the estimates may be different. Perhaps more importantly, the precision of predictions of future responses would be different. I'm not even sure how one would even formulate such a prediction from a model with fixed effects for the blocking factor if the factor was something like "batch" and the batches from the experiment were already used up. Having said that the precision of the estimates of the fixed-effects parameters would be different if random effects are used for the blocking factor I should admit that this is exactly the problem to which I don't have a good general solution. It appears that the question of fixed or random for a blocking factor is like many others in statistics - the choice of the model depends on what you want to do with it.