Split-plot Design

On Thu, Mar 20, 2008 at 7:04 PM, John Maindonald
I do not think it quite true that the aov model that has an
 Error() term is a fixed effects model.  The use of the word
 "stratum" implies that a mixed effects model is lurking
 somewhere.  The F-tests surely assume such a model.
 Some little time ago, Doug Bates invested me, along with
 Peter Dalgaard, a member of the degrees of freedom police.
 Problem is, I am unsure of the responsibilities, but maybe
 they include commenting on a case such as this.
Indeed, I should have been more explicit regarding the duties of a
member of the degrees of freedom police when I appointed you :-)  They
are exactly what you are doing - to weigh in on discussions of degrees
of freedom.

Perhaps I can expand a bit on why I think that the degrees of freedom
issue is not a good basis for approaching mixed models in general.

Duncan Temple Lang used to have a quote about "Language shapes the way
we think." in his email signature and I think similar ideas apply to
how we think about models.  Certainly the concept of error strata and
the associated degrees of freedom are one way of thinking about
mixed-effects models and they are effective in the case of completely
balanced designs.  If one has a completely balanced design and the
number of units at various strata is small then decomposition of the
variability into error strata is possible and the calculation of
degrees of freedom is important (because the degrees of freedom are
small - exact values of degrees of freedom are unimportant when they
are large).  Such data occur frequently - in text books.  In fact,
they are the only type of data that occur in most text books and hence
they have shaped the way we think about the models.  Generations of
statisticians have looked at techniques for small, balanced data sets
and decided that these define "the" approach to the model.  So when
confronted with large observational (and, hence, unbalanced or
"messy") data sets they approach the modeling with this mind set.

Certainly such data did occur in some agricultural trials (I'm not
sure if this is the state of the art in current agricultural trials)
and this form of analysis was important but it depends strongly on
balanced data and balance is a very fragile characteristic in most
data.  The one exception is data in text books - most often data for
examples in older texts were added as an afterthought and, for space
reasons and to illustrate the calculations, were small data sets which
just happened to be completely balanced so that all the calculations
worked out.   Fortunately this is no longer the case in books like
your book with Braun where the data sets are real data and available
separately from the book in R packages but it will take many years to
flush the way of thinking about models induced by those small,
balanced data sets from statistical "knowledge".  (There is an
enormous amount of inertia built into the system.  Most introductory
statistics textbooks published today, and probably those published for
the next couple of decades, will include "statistical tables" as
appendices because, as everyone knows, all of the statistical analysis
we do in practice involves doing a few calculations on a hand
calculator and looking up tabulated values of distributions.)

Getting back to the "language shapes the way we think" issue, I would
compare it to strategy versus tactics.  I spent a sabbatical in
Australia visiting Bill Venables.  I needed to adjust to driving on
the left hand side of the road and was able to do that after a short
initial period of confusion and discomfort.  If I think of places in
Adelaide now it is natural for me to think of myself sitting on the
right hand side of the front seat and driving on the left hand side of
the road.  That's tactics.  However, even after living in Adelaide for
several months I remember leaving the parking lot of a shopping mall
and choosing an indirect route out the back instead of the direct
route ahead because the direct route would involve a left hand turn
onto a busy road.  Instead I chose to make two right hand turns to get
to an intersection with traffic lights where I could make the left
turn.  At the level of strategy I still "knew" that left hand turns
are difficult and right hand turns are easy - exactly the wrong
approach in Australia.

This semester I am attending a seminar on hierarchical linear models
organized in our social sciences school by a statistician whose
background is in agricultural statistics.  I make myself unpopular
because I keep challenging the ideas of the social scientists and of
the organizer.  The typical kinds of studies we discuss are
longitudinal studies where the "observational units" (i.e. people) are
grouped within some social contexts.  These are usually large, highly
unbalanced data  sets. Because they are longitudinal they inevitably
involve some migration of people from one group to another.

The migration means that random effects for person and for the various
types of groups are not nested.  The models are not hierarchical or at
least they should not be.  Trying to jam the analysis of such data
into the hierarchical/multilevel framework of software like HLM or
MLwin doesn't work well but that certainly won't stop people from
trying.  The hierarchical language in which they learned the model
affects their strategy.

The agricultural statistician, by contrast, doesn't worry about the
nesting etc.,  For him the sole issue of important is to determine the
number of degrees of freedom associated with the various error strata.
 These studies involve tens of thousands of subjects and are nowhere
close to being balanced but his strategy is still based on having a
certain number of blocks and plots within the blocks and subplots
within the plots and everything is exactly balanced.

So I am not opposed to approaches based on error strata and computing
degrees of freedom where appropriate and where important.  But try not
to let the particular case of completely balanced small data sets
determine the strategy of the approach to all models involving random
effects.  It's fragile and does not generalize well, just as assuming
a strict hierarchy of factors associated with random effects is
fragile.  Balance is the special case.  Nesting is the special case.
It is a bad idea to base strategy on what happens in very special
cases.

they are always balanced and small data sets

Another very fragile
 lme() makes a stab at an appropriate choice of degrees
 of freedom, but does not always get it right, to the extent
 that there is a right answer.  [lmer() has for the time being
 given up on giving degrees of freedom and p-values for
 fixed effects estimates.]  This part of the output from lme()
 should, accordingly, be used with discretion.  In case of
 doubt, check against a likelihood ratio test.  In a simple
 enough experimental design, users who understand how
 to calculate degrees of freedom will reason them out for
 themselves.
 John Maindonald.

 John Maindonald             email: john.maindonald at anu.edu.au
 phone : +61 2 (6125)3473    fax  : +61 2(6125)5549
 Centre for Mathematics & Its Applications, Room 1194,
 John Dedman Mathematical Sciences Building (Building 27)
 Australian National University, Canberra ACT 0200.

 On 21 Mar 2008, at 9:23 AM, Kevin Wright wrote:

 > Your question is not very clear, but if you are trying to match the
 > results in Kuehl, you need a fixed-effects model:
 >
 > dat <- read.table("expl14-1.txt", header=TRUE)
 > dat$block <- factor(dat$block)
 > dat$nitro <- factor(dat$nitro)
 > dat$thatch <- factor(dat$thatch)
 >
 > colnames(dat) <- c("block","nitro","thatch","chlor")
 > m1 <- aov(chlor~nitro*thatch+Error(block/nitro), data=dat)
 > summary(m1)
 >
 > Mixed-effects models and degrees of freedom have been discussed many
 > times on this list....search the archives.
 >
 > K Wright
 >
 >
 > On Thu, Mar 20, 2008 at 12:39 PM,  <marcioestat at pop.com.br> wrote:
 >>
 >>
 >>
 >> Hi listers,
 >>
 >> I've been studying anova and at the book of Kuehl at the chapter
 >> about split-plot there is a experiment with the results... I am
 >> trying to
 >> understand the experiments and make the code in order to obtain the
 >> results... But there is something that I didn't understand yet...
 >> I have a split-plot design (2 blocks) with two facteurs, one
 >> facteur has 4 treatments and the other facteur is a measure
 >> taken in three years...
 >> I organize my data set as:
 >>
 >> Nitro Bloc Year Measure
 >> a
 >> x
 >> 1         3.8
 >> a
 >>  x
 >> 2         3.9
 >> a         x         3         2.0
 >> a         y         1         3.7
 >> a         y         2
 >> 2.4
 >> a         y         3
 >> 1.2
 >> b         x
 >>  1         4.0
 >> b         x
 >> 2         2.5
 >> and so on...
 >>
 >>
 >> So, I am trying this code, because I want to test each factor and the
 >> interaction...
 >> lme=lme(measure ~ bloc + nitro + bloc*nitro, random= ~ 1|year,
 >> data=lme)
 >> summary(lme)
 >> The results that I am obtaining are not correct, because
 >> I calculated the degrees of fredom and they are not
 >> correct... According to this design I will get two errors one for the
 >> whole plot and other for the subplot....
 >>
 >> Well, as I told you, I am still learning... Any suggestions...
 >>
 >> Thanks in advance,
 >>
 >> Ribeiro
 >>
 >>
 >>        [[alternative HTML version deleted]]
 >>
 >>
 >> _______________________________________________
 >> R-sig-mixed-models at r-project.org mailing list
 >> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
 >>
 >>
 >
 > _______________________________________________
 > R-sig-mixed-models at r-project.org mailing list
 > https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models

 _______________________________________________
 R-sig-mixed-models at r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models

Split-plot Design

Thread (12 messages)