Should blocking factors be modeled as random effects?

An embedded and charset-unspecified text was scrubbed...
Name: not available
URL: <https://stat.ethz.ch/pipermail/r-sig-mixed-models/attachments/20090126/03ae768f/attachment.pl>
Hello,
Treating a block effect as random allows recovering inter-block 
information. In a complete randomized block design (CRBD), treating 
blocks as fixed or random should yield identical results. In an 
incomplete block design (incomplete by design or by missing at random 
some observations), the results will differ.

If the Gaussian assumption regarding the block effects are sound, I 
would expect that treating the block as random will be more efficient 
that fitting block as fixed. Moreover, one could compute the relative 
efficiency of both analyses by comparing the variances of a particular 
treatment difference when block is treated as fixed versus when it is 
treated as a random effect.

The catch is that the relative efficiency depends on the actual variance 
ratios (unknown) and on the assumptions regarding the random effects 
(commonly, Gaussian distribution).

In practice, when analyzing field or lab experiments, I tend to specify 
the block as a random effect. Always.
In some cases there are very few levels, though. In those cases, if 
someone asks "how can you reliably estimate a variance component for a 
(blocking) factor with only (say) 4 or 5 levels?", I just shrug. 8^D

JP
I have been following your R discussion list on mixed modeling for a few
weeks, in hopes of understanding mixed modeling better.  And it has
helped.  I was not aware of the controversy surrounding degrees of
freedom and the distribution of test statistics.  I have just been
trusting the ANOVA output from software (Minitab, JMP) that reported F
tests.  JMP uses Kenward-Roger, Minitab's ANOVA reports an F-statistic,
followed by "F-test not exact for this term". 

A recent mention by Douglas Bates of George Box, though, hit upon an
aspect of mixed models that has confused me.  I'm an industrial
statistician, and studied statistics at Iowa State and the University of
Minnesota.  I have had 3 courses in DOE, 2 at the graduate level, and
none of them mentioned blocking factors could (should?) be modeled as
random effects.  **Exception: the whole plots in a split plot design
were taught as random effects.**

The 2005 update to Box Hunter Hunter discusses blocking as does Wu &
Hamada (2000).  Both texts model blocking factors such as Days and
Batches as fixed effects.  Montgomery's DOE text, 2009 rev., pretty
consistently states that blocks can be either random or fixed.  Don't
have a consensus from that small sample. 

I'm trying to understand the implications if I consistently used random
effects for DOE analysis.

I'm quite willing to use R for mixed models, seeing as Minitab, JMP etc.
appear to use degrees of freedom calculations that are questionable.
But as Douglas points out --- Box said, "all models are wrong, some are
useful" => Box's latest text doesn't bother with random effects for DOE
=>  does it follow that for practical purposes it's OK to consider
blocks as fixed?  There are certainly several advantages to keeping it
simple (i.e. fixed only):
* The analyses we (my statistics group) provide to our chemists and
engineers are more easily understood
* The 2-day short courses we teach in DOE to these same coworkers
couldn't realistically get across the idea of mixed model analysis ---
they would become less self-sufficient, where we're trying to make them
more self-sufficient
* We have a handful of softwares (Minitab, JMP, Design Expert) that can
perform DOE and augment the results in a number of ways:
   *** fold-over the design to resolve aliasing in fractional designs
   *** add axial runs to enable Response Surface methods
   *** add distributions to the input factors, enabling
Robustness/Sensitivity analyses
   *** running optimization algorithms to suggest the factor settings
that simultaneously consider multiple objectives
 *****  Not to mention the loss of Sample Size Calculations, far and
away my most frequent request
None of these softwares recognize random factors to perform these
augmentations

Replacing this functionality with R is going to be a high learning
curve, and probably not entirely possible.  My coding skills in R
consist of cutting and pasting what others have done.

I don't really expect that there's a "right" answer to the question of
random effects in DOE.  But I do believe that beyond the loss of
p-values, there are other ramifications for advising experimenters,
'"You can't trust results from your blocking on Days (or Shifts or RM
Lots or Batches, etc) unless they are modeled as random effects."

There's statistical significance, and practical significance.  My hope
is that blocks while random effects are statistically "truer", their
marginal worth over fixed effects in DOE is ignorable. Again, I don't
want this to come across as shooting the messenger, you are only laying
out the current state of art and the work that remains to be done.  But
any insight you can provide into what's practical right now would be
highly interesting.

Thank you for your time and consideration,
Paul Prew

651-795-5942     fax 651-204-7504
Ecolab Research Center
Mail Stop ESC-F4412
Lone Oak Drive
Eagan, MN 55121-1560

CONFIDENTIALITY NOTICE: \ This e-mail communication an...{{dropped:11}}

_______________________________________________
R-sig-mixed-models at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models

=============================
Juan Pedro Steibel

Assistant Professor
Statistical Genetics and Genomics

Department of Animal Science & 
Department of Fisheries and Wildlife

Michigan State University
1205-I Anthony Hall
East Lansing, MI
48824 USA 

Phone: 1-517-353-5102
E-mail: steibelj at msu.edu
"In a complete randomized block design (CRBD), treating blocks as  
fixed or random should yield identical results."

It depends what you mean by "results".  SEs of effects, for treatments  
that are estimated "within blocks", will be the same.  The between  
block variance does not contribute to this SE.

Estimates of SEs of treatment means may be very different.  The  
between block variance does contribute to this SE.  This is where it  
does matter if there are very few blocks.  The SE will be estimated  
with very poor accuracy (low df).

Of course, the SEs of effects assume that there is no systematic  
change in treatment effect from one block to another.  Unless there  
are super-blocks (sites?), there is no way to estimate the SE of any  
block-treatment interaction.  Look at the kiwishade data in the DAAG  
package for an example where there might well be differences between  
blocks that are affected by the direction in which the blocks face.

John Maindonald             email: john.maindonald at anu.edu.au
phone : +61 2 (6125)3473    fax  : +61 2(6125)5549
Centre for Mathematics & Its Applications, Room 1194,
John Dedman Mathematical Sciences Building (Building 27)
Australian National University, Canberra ACT 0200.

Hello,
Treating a block effect as random allows recovering inter-block  
information. In a complete randomized block design (CRBD), treating  
blocks as fixed or random should yield identical results. In an  
incomplete block design (incomplete by design or by missing at  
random some observations), the results will differ.

If the Gaussian assumption regarding the block effects are sound, I  
would expect that treating the block as random will be more  
efficient that fitting block as fixed. Moreover, one could compute  
the relative efficiency of both analyses by comparing the variances  
of a particular treatment difference when block is treated as fixed  
versus when it is treated as a random effect.

The catch is that the relative efficiency depends on the actual  
variance ratios (unknown) and on the assumptions regarding the  
random effects (commonly, Gaussian distribution).

In practice, when analyzing field or lab experiments, I tend to  
specify the block as a random effect. Always.
In some cases there are very few levels, though. In those cases, if  
someone asks "how can you reliably estimate a variance component for  
a (blocking) factor with only (say) 4 or 5 levels?", I just shrug. 8^D

JP

Prew, Paul wrote:
I have been following your R discussion list on mixed modeling for  
a few
weeks, in hopes of understanding mixed modeling better.  And it has
helped.  I was not aware of the controversy surrounding degrees of
freedom and the distribution of test statistics.  I have just been
trusting the ANOVA output from software (Minitab, JMP) that  
reported F
tests.  JMP uses Kenward-Roger, Minitab's ANOVA reports an F- 
statistic,
followed by "F-test not exact for this term".
A recent mention by Douglas Bates of George Box, though, hit upon an
aspect of mixed models that has confused me.  I'm an industrial
statistician, and studied statistics at Iowa State and the  
University of
Minnesota.  I have had 3 courses in DOE, 2 at the graduate level, and
none of them mentioned blocking factors could (should?) be modeled as
random effects.  **Exception: the whole plots in a split plot design
were taught as random effects.**

The 2005 update to Box Hunter Hunter discusses blocking as does Wu &
Hamada (2000).  Both texts model blocking factors such as Days and
Batches as fixed effects.  Montgomery's DOE text, 2009 rev., pretty
consistently states that blocks can be either random or fixed.  Don't
have a consensus from that small sample.
I'm trying to understand the implications if I consistently used  
random
effects for DOE analysis.

I'm quite willing to use R for mixed models, seeing as Minitab, JMP  
etc.
appear to use degrees of freedom calculations that are questionable.
But as Douglas points out --- Box said, "all models are wrong, some  
are
useful" => Box's latest text doesn't bother with random effects for  
DOE
=>  does it follow that for practical purposes it's OK to consider
blocks as fixed?  There are certainly several advantages to keeping  
it
simple (i.e. fixed only):
* The analyses we (my statistics group) provide to our chemists and
engineers are more easily understood
* The 2-day short courses we teach in DOE to these same coworkers
couldn't realistically get across the idea of mixed model analysis  
---
they would become less self-sufficient, where we're trying to make  
them
more self-sufficient
* We have a handful of softwares (Minitab, JMP, Design Expert) that  
can
perform DOE and augment the results in a number of ways:
*** fold-over the design to resolve aliasing in fractional designs
*** add axial runs to enable Response Surface methods
*** add distributions to the input factors, enabling
Robustness/Sensitivity analyses
*** running optimization algorithms to suggest the factor settings
that simultaneously consider multiple objectives
*****  Not to mention the loss of Sample Size Calculations, far and
away my most frequent request
None of these softwares recognize random factors to perform these
augmentations

Replacing this functionality with R is going to be a high learning
curve, and probably not entirely possible.  My coding skills in R
consist of cutting and pasting what others have done.

I don't really expect that there's a "right" answer to the question  
of
random effects in DOE.  But I do believe that beyond the loss of
p-values, there are other ramifications for advising experimenters,
'"You can't trust results from your blocking on Days (or Shifts or RM
Lots or Batches, etc) unless they are modeled as random effects."

There's statistical significance, and practical significance.  My  
hope
is that blocks while random effects are statistically "truer", their
marginal worth over fixed effects in DOE is ignorable. Again, I don't
want this to come across as shooting the messenger, you are only  
laying
out the current state of art and the work that remains to be done.   
But
any insight you can provide into what's practical right now would be
highly interesting.

Thank you for your time and consideration,
Paul Prew

651-795-5942     fax 651-204-7504
Ecolab Research Center
Mail Stop ESC-F4412
Lone Oak Drive
Eagan, MN 55121-1560

CONFIDENTIALITY NOTICE: \ This e-mail communication an...{{dropped: 
11}}

_______________________________________________
R-sig-mixed-models at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models

-- 
=============================
Juan Pedro Steibel

Assistant Professor
Statistical Genetics and Genomics

Department of Animal Science & Department of Fisheries and Wildlife

Michigan State University
1205-I Anthony Hall
East Lansing, MI
48824 USA
Phone: 1-517-353-5102
E-mail: steibelj at msu.edu

_______________________________________________
R-sig-mixed-models at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
Thanks for the comment John,
I should have written that better. I had in mind a very simple CRBD with 
one grouping factor (treatment) and complete blocks with only one plot 
per treatment. You are perfectly right, when there are between and 
within block (or plot) treatments (example: split-plot, strip-plot, 
split-block designs), the way to go is to consider the blocks and plots 
as random effects.

I meant to say that in treating the block as random produced the same 
inferences (SE and all) only in (very) simple design, while in more 
complex designs, the random block effect leads to better inferences. 
That is the reason I treat block as random by default.
Thanks again.
JP
"In a complete randomized block design (CRBD), treating blocks as 
fixed or random should yield identical results."

It depends what you mean by "results".  SEs of effects, for treatments 
that are estimated "within blocks", will be the same.  The between 
block variance does not contribute to this SE.

Estimates of SEs of treatment means may be very different.  The 
between block variance does contribute to this SE.  This is where it 
does matter if there are very few blocks.  The SE will be estimated 
with very poor accuracy (low df).

Of course, the SEs of effects assume that there is no systematic 
change in treatment effect from one block to another.  Unless there 
are super-blocks (sites?), there is no way to estimate the SE of any 
block-treatment interaction.  Look at the kiwishade data in the DAAG 
package for an example where there might well be differences between 
blocks that are affected by the direction in which the blocks face.

John Maindonald             email: john.maindonald at anu.edu.au
phone : +61 2 (6125)3473    fax  : +61 2(6125)5549
Centre for Mathematics & Its Applications, Room 1194,
John Dedman Mathematical Sciences Building (Building 27)
Australian National University, Canberra ACT 0200.

On 27/01/2009, at 8:51 AM, Juan Pedro Steibel wrote:

Hello,
Treating a block effect as random allows recovering inter-block 
information. In a complete randomized block design (CRBD), treating 
blocks as fixed or random should yield identical results. In an 
incomplete block design (incomplete by design or by missing at random 
some observations), the results will differ.

If the Gaussian assumption regarding the block effects are sound, I 
would expect that treating the block as random will be more efficient 
that fitting block as fixed. Moreover, one could compute the relative 
efficiency of both analyses by comparing the variances of a 
particular treatment difference when block is treated as fixed versus 
when it is treated as a random effect.

The catch is that the relative efficiency depends on the actual 
variance ratios (unknown) and on the assumptions regarding the random 
effects (commonly, Gaussian distribution).

In practice, when analyzing field or lab experiments, I tend to 
specify the block as a random effect. Always.
In some cases there are very few levels, though. In those cases, if 
someone asks "how can you reliably estimate a variance component for 
a (blocking) factor with only (say) 4 or 5 levels?", I just shrug. 8^D

JP

Prew, Paul wrote:
I have been following your R discussion list on mixed modeling for a 
few
weeks, in hopes of understanding mixed modeling better.  And it has
helped.  I was not aware of the controversy surrounding degrees of
freedom and the distribution of test statistics.  I have just been
trusting the ANOVA output from software (Minitab, JMP) that reported F
tests.  JMP uses Kenward-Roger, Minitab's ANOVA reports an F-statistic,
followed by "F-test not exact for this term".
A recent mention by Douglas Bates of George Box, though, hit upon an
aspect of mixed models that has confused me.  I'm an industrial
statistician, and studied statistics at Iowa State and the 
University of
Minnesota.  I have had 3 courses in DOE, 2 at the graduate level, and
none of them mentioned blocking factors could (should?) be modeled as
random effects.  **Exception: the whole plots in a split plot design
were taught as random effects.**

The 2005 update to Box Hunter Hunter discusses blocking as does Wu &
Hamada (2000).  Both texts model blocking factors such as Days and
Batches as fixed effects.  Montgomery's DOE text, 2009 rev., pretty
consistently states that blocks can be either random or fixed.  Don't
have a consensus from that small sample.
I'm trying to understand the implications if I consistently used random
effects for DOE analysis.

I'm quite willing to use R for mixed models, seeing as Minitab, JMP 
etc.
appear to use degrees of freedom calculations that are questionable.
But as Douglas points out --- Box said, "all models are wrong, some are
useful" => Box's latest text doesn't bother with random effects for DOE
=>  does it follow that for practical purposes it's OK to consider
blocks as fixed?  There are certainly several advantages to keeping it
simple (i.e. fixed only):
* The analyses we (my statistics group) provide to our chemists and
engineers are more easily understood
* The 2-day short courses we teach in DOE to these same coworkers
couldn't realistically get across the idea of mixed model analysis ---
they would become less self-sufficient, where we're trying to make them
more self-sufficient
* We have a handful of softwares (Minitab, JMP, Design Expert) that can
perform DOE and augment the results in a number of ways:
*** fold-over the design to resolve aliasing in fractional designs
*** add axial runs to enable Response Surface methods
*** add distributions to the input factors, enabling
Robustness/Sensitivity analyses
*** running optimization algorithms to suggest the factor settings
that simultaneously consider multiple objectives
*****  Not to mention the loss of Sample Size Calculations, far and
away my most frequent request
None of these softwares recognize random factors to perform these
augmentations

Replacing this functionality with R is going to be a high learning
curve, and probably not entirely possible.  My coding skills in R
consist of cutting and pasting what others have done.

I don't really expect that there's a "right" answer to the question of
random effects in DOE.  But I do believe that beyond the loss of
p-values, there are other ramifications for advising experimenters,
'"You can't trust results from your blocking on Days (or Shifts or RM
Lots or Batches, etc) unless they are modeled as random effects."

There's statistical significance, and practical significance.  My hope
is that blocks while random effects are statistically "truer", their
marginal worth over fixed effects in DOE is ignorable. Again, I don't
want this to come across as shooting the messenger, you are only laying
out the current state of art and the work that remains to be done.  But
any insight you can provide into what's practical right now would be
highly interesting.

Thank you for your time and consideration,
Paul Prew

651-795-5942     fax 651-204-7504
Ecolab Research Center
Mail Stop ESC-F4412
Lone Oak Drive
Eagan, MN 55121-1560

CONFIDENTIALITY NOTICE: \ This e-mail communication an...{{dropped:11}}

_______________________________________________
R-sig-mixed-models at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models

-- 
=============================
Juan Pedro Steibel

Assistant Professor
Statistical Genetics and Genomics

Department of Animal Science & Department of Fisheries and Wildlife

Michigan State University
1205-I Anthony Hall
East Lansing, MI
48824 USA
Phone: 1-517-353-5102
E-mail: steibelj at msu.edu

_______________________________________________
R-sig-mixed-models at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models

_______________________________________________
R-sig-mixed-models at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models

=============================
Juan Pedro Steibel

Assistant Professor
Statistical Genetics and Genomics

Department of Animal Science & 
Department of Fisheries and Wildlife

Michigan State University
1205-I Anthony Hall
East Lansing, MI
48824 USA 

Phone: 1-517-353-5102
E-mail: steibelj at msu.edu
I have been following your R discussion list on mixed modeling for a few
weeks, in hopes of understanding mixed modeling better.  And it has
helped.  I was not aware of the controversy surrounding degrees of
freedom and the distribution of test statistics.  I have just been
trusting the ANOVA output from software (Minitab, JMP) that reported F
tests.  JMP uses Kenward-Roger, Minitab's ANOVA reports an F-statistic,
followed by "F-test not exact for this term".

A recent mention by Douglas Bates of George Box, though, hit upon an
aspect of mixed models that has confused me.  I'm an industrial
statistician, and studied statistics at Iowa State and the University of
Minnesota.  I have had 3 courses in DOE, 2 at the graduate level, and
none of them mentioned blocking factors could (should?) be modeled as
random effects.  **Exception: the whole plots in a split plot design
were taught as random effects.**

The 2005 update to Box Hunter Hunter discusses blocking as does Wu &
Hamada (2000).  Both texts model blocking factors such as Days and
Batches as fixed effects.  Montgomery's DOE text, 2009 rev., pretty
consistently states that blocks can be either random or fixed.  Don't
have a consensus from that small sample.

I'm trying to understand the implications if I consistently used random
effects for DOE analysis.

I'm quite willing to use R for mixed models, seeing as Minitab, JMP etc.
appear to use degrees of freedom calculations that are questionable.
But as Douglas points out --- Box said, "all models are wrong, some are
useful" => Box's latest text doesn't bother with random effects for DOE
=>  does it follow that for practical purposes it's OK to consider
blocks as fixed?  There are certainly several advantages to keeping it
simple (i.e. fixed only):
* The analyses we (my statistics group) provide to our chemists and
engineers are more easily understood
* The 2-day short courses we teach in DOE to these same coworkers
couldn't realistically get across the idea of mixed model analysis ---
they would become less self-sufficient, where we're trying to make them
more self-sufficient
* We have a handful of softwares (Minitab, JMP, Design Expert) that can
perform DOE and augment the results in a number of ways:
  *** fold-over the design to resolve aliasing in fractional designs
  *** add axial runs to enable Response Surface methods
  *** add distributions to the input factors, enabling
Robustness/Sensitivity analyses
  *** running optimization algorithms to suggest the factor settings
that simultaneously consider multiple objectives
 *****  Not to mention the loss of Sample Size Calculations, far and
away my most frequent request
None of these softwares recognize random factors to perform these
augmentations

Replacing this functionality with R is going to be a high learning
curve, and probably not entirely possible.  My coding skills in R
consist of cutting and pasting what others have done.

I don't really expect that there's a "right" answer to the question of
random effects in DOE.  But I do believe that beyond the loss of
p-values, there are other ramifications for advising experimenters,
'"You can't trust results from your blocking on Days (or Shifts or RM
Lots or Batches, etc) unless they are modeled as random effects."

There's statistical significance, and practical significance.  My hope
is that blocks while random effects are statistically "truer", their
marginal worth over fixed effects in DOE is ignorable. Again, I don't
want this to come across as shooting the messenger, you are only laying
out the current state of art and the work that remains to be done.  But
any insight you can provide into what's practical right now would be
highly interesting.
Thanks for bringing up the topic, Paul.  As you and I know, you
originally sent your question to me and I encouraged you to send it to
this list.

As I wrote in my initial response to you,  "My off-the-cuff reaction
is that in these situations the effects of blocking factors are
regarded as nuisance parameters whereas in many mixed-model situations
the variances and sometimes the values of the random effects are
themselves of interest.  When the effects are
nuisance parameters the simplest approach is to model them as fixed effects."

On thinking about it more, I can imagine several different approaches
to this question.  If you just ask, "Are the levels of this blocking
factor a fixed set of levels or a random selection from a population
of possible levels?" then in most cases I imagine you would say they
are a random selection and should be modeled using random effects.
This would especially be true of what Taguchi called "environmental
factors" which, by definition, are not under the control of the
experimenter.

If you say that blocking factors are not of interest per se and that
your purpose is simply to control for them, it is simpler to model
them as fixed effects.  There are two aspects to "simpler":
computationally simpler and conceptually simpler.  Of these I think
that conceptually is more important.  The computational burden for
fitting a mixed model versus a fixed-effects model is really a
software problem, not a hardware problem.  Commercial statistical
software like Minitab or JMP with a simple, convenient interface has
limited flexibility, in part because it is designed to have a simple,
convenient interface - the "what you see is all you get" problem.  (I
googled that phrase and got a laugh from the article at
www.computer-dictionary-online.org which referred to "point-and-drool
interfaces".)  The actual calculations involved in fitting mixed
models are not that formidable but designing the interface can be.
(One of the underappreciated aspects of the model-fitting software in
R, and in the S language in general, is the structure of the
model.frame, model.matrix sequence for transforming a formula into a
numerical representation.  This makes designing an interface much.
much easier as long as you count on the user to input a formula.)

Conceptually fixed-effects models are simpler than mixed models but
they may over-simplify the analysis.  If your purpose is estimation of
fixed-effects parameters, including assessing precision of the
estimates, then you need to ask if you want to estimate those
parameters conditional the particular levels of the blocking factor
that you observed or with respect to the possible values of the
blocking factors that are represented by the sample you observed.  If
you are willing to condition on the particular levels you observed
then use fixed-effects for the blocking factor.  For all possible
levels of the blocking factor you could use random effects.  For a
designed experiment the estimates of the fixed effects will probably
not be affected much by using random effects for the blocking factor
instead of fixed effects.  However the precision of the estimates may
be different.  Perhaps more importantly, the precision of predictions
of future responses would be different.  I'm not even sure how one
would even formulate such a prediction from a model with fixed effects
for the blocking factor if the factor was something like "batch" and
the batches from the experiment were already used up.

Having said that the precision of the estimates of the fixed-effects
parameters would be different if random effects are used for the
blocking factor I should admit that this is exactly the problem to
which I don't have a good general solution.

It appears that the question of fixed or random for a blocking factor
is like many others in statistics - the choice of the model depends on
what you want to do with it.