Skip to content

Should blocking factors be modeled as random effects?

5 messages · Prew, Paul, Juan Pedro Steibel, John Maindonald +1 more

#
Hello,
Treating a block effect as random allows recovering inter-block 
information. In a complete randomized block design (CRBD), treating 
blocks as fixed or random should yield identical results. In an 
incomplete block design (incomplete by design or by missing at random 
some observations), the results will differ.

If the Gaussian assumption regarding the block effects are sound, I 
would expect that treating the block as random will be more efficient 
that fitting block as fixed. Moreover, one could compute the relative 
efficiency of both analyses by comparing the variances of a particular 
treatment difference when block is treated as fixed versus when it is 
treated as a random effect.

The catch is that the relative efficiency depends on the actual variance 
ratios (unknown) and on the assumptions regarding the random effects 
(commonly, Gaussian distribution).

In practice, when analyzing field or lab experiments, I tend to specify 
the block as a random effect. Always.
In some cases there are very few levels, though. In those cases, if 
someone asks "how can you reliably estimate a variance component for a 
(blocking) factor with only (say) 4 or 5 levels?", I just shrug. 8^D

JP
Prew, Paul wrote:

  
    
4 days later
#
"In a complete randomized block design (CRBD), treating blocks as  
fixed or random should yield identical results."

It depends what you mean by "results".  SEs of effects, for treatments  
that are estimated "within blocks", will be the same.  The between  
block variance does not contribute to this SE.

Estimates of SEs of treatment means may be very different.  The  
between block variance does contribute to this SE.  This is where it  
does matter if there are very few blocks.  The SE will be estimated  
with very poor accuracy (low df).

Of course, the SEs of effects assume that there is no systematic  
change in treatment effect from one block to another.  Unless there  
are super-blocks (sites?), there is no way to estimate the SE of any  
block-treatment interaction.  Look at the kiwishade data in the DAAG  
package for an example where there might well be differences between  
blocks that are affected by the direction in which the blocks face.

John Maindonald             email: john.maindonald at anu.edu.au
phone : +61 2 (6125)3473    fax  : +61 2(6125)5549
Centre for Mathematics & Its Applications, Room 1194,
John Dedman Mathematical Sciences Building (Building 27)
Australian National University, Canberra ACT 0200.
On 27/01/2009, at 8:51 AM, Juan Pedro Steibel wrote:

            
#
Thanks for the comment John,
I should have written that better. I had in mind a very simple CRBD with 
one grouping factor (treatment) and complete blocks with only one plot 
per treatment. You are perfectly right, when there are between and 
within block (or plot) treatments (example: split-plot, strip-plot, 
split-block designs), the way to go is to consider the blocks and plots 
as random effects.

I meant to say that in treating the block as random produced the same 
inferences (SE and all) only in (very) simple design, while in more 
complex designs, the random block effect leads to better inferences. 
That is the reason I treat block as random by default.
Thanks again.
JP
John Maindonald wrote:

  
    
#
On Mon, Jan 26, 2009 at 2:42 PM, Prew, Paul <Paul.Prew at ecolab.com> wrote:
Thanks for bringing up the topic, Paul.  As you and I know, you
originally sent your question to me and I encouraged you to send it to
this list.

As I wrote in my initial response to you,  "My off-the-cuff reaction
is that in these situations the effects of blocking factors are
regarded as nuisance parameters whereas in many mixed-model situations
the variances and sometimes the values of the random effects are
themselves of interest.  When the effects are
nuisance parameters the simplest approach is to model them as fixed effects."

On thinking about it more, I can imagine several different approaches
to this question.  If you just ask, "Are the levels of this blocking
factor a fixed set of levels or a random selection from a population
of possible levels?" then in most cases I imagine you would say they
are a random selection and should be modeled using random effects.
This would especially be true of what Taguchi called "environmental
factors" which, by definition, are not under the control of the
experimenter.

If you say that blocking factors are not of interest per se and that
your purpose is simply to control for them, it is simpler to model
them as fixed effects.  There are two aspects to "simpler":
computationally simpler and conceptually simpler.  Of these I think
that conceptually is more important.  The computational burden for
fitting a mixed model versus a fixed-effects model is really a
software problem, not a hardware problem.  Commercial statistical
software like Minitab or JMP with a simple, convenient interface has
limited flexibility, in part because it is designed to have a simple,
convenient interface - the "what you see is all you get" problem.  (I
googled that phrase and got a laugh from the article at
www.computer-dictionary-online.org which referred to "point-and-drool
interfaces".)  The actual calculations involved in fitting mixed
models are not that formidable but designing the interface can be.
(One of the underappreciated aspects of the model-fitting software in
R, and in the S language in general, is the structure of the
model.frame, model.matrix sequence for transforming a formula into a
numerical representation.  This makes designing an interface much.
much easier as long as you count on the user to input a formula.)

Conceptually fixed-effects models are simpler than mixed models but
they may over-simplify the analysis.  If your purpose is estimation of
fixed-effects parameters, including assessing precision of the
estimates, then you need to ask if you want to estimate those
parameters conditional the particular levels of the blocking factor
that you observed or with respect to the possible values of the
blocking factors that are represented by the sample you observed.  If
you are willing to condition on the particular levels you observed
then use fixed-effects for the blocking factor.  For all possible
levels of the blocking factor you could use random effects.  For a
designed experiment the estimates of the fixed effects will probably
not be affected much by using random effects for the blocking factor
instead of fixed effects.  However the precision of the estimates may
be different.  Perhaps more importantly, the precision of predictions
of future responses would be different.  I'm not even sure how one
would even formulate such a prediction from a model with fixed effects
for the blocking factor if the factor was something like "batch" and
the batches from the experiment were already used up.

Having said that the precision of the estimates of the fixed-effects
parameters would be different if random effects are used for the
blocking factor I should admit that this is exactly the problem to
which I don't have a good general solution.

It appears that the question of fixed or random for a blocking factor
is like many others in statistics - the choice of the model depends on
what you want to do with it.