Unbalanced Anova: What is the best approach?

Dear Spencer,
-----Original Message-----
From: Spencer Graves [mailto:spencer.graves at prodsyse.com]
Sent: April-03-11 11:07 AM
To: Krishna Kirti Das
Cc: John Fox; r-help at r-project.org
Subject: Re: [R] Unbalanced Anova: What is the best approach?

Hi, Krishna:

<in line>

On 4/3/2011 7:35 AM, Krishna Kirti Das wrote:
Thank you, John.

Yes, your answers do help. For me it's mainly about getting familiar
with the "R" way of doing things.

Thus your response also confirms what I suspected, that there is no
explicit user-interface (at least one that is widely used) in terms of
functions/packages that represents an unbalanced design in the same
way that aov would represent a balanced one. Analyzing balanced and
unbalanced data are obviously possible, but with balanced designs via
aov what has to be done is intuitive within the language but
unintuitive for unbalanced designs.
       Intuition is subject to one's background and expectations.  If you
think in terms of a series of nested hypotheses, then the standard R anova
is very intuitive.  I never use aov, because it's not intuitive to me and
not very general.  'aov' is only useful for a balanced design with normal
independent errors with constant variance.  The real world is rarely so
simple.  The 'aov' algorithm was wonderful over half a century ago, when
all computations were done by hand or using a mechanical calculator (e.g.,
an abacus or a calculator with gears).
Unbalanced designs were largely impractical because of computational
difficulties.  There were many procedures for imputing missing values for
a design that was "almost balanced".

       I encourage you to think in terms of alternative sequences of
nested hypotheses, including the implications of A being significant by
itself, but not with B already present, except that the A:B interaction is
or is not significant.
So-called type-II tests do exactly that -- that is, obey the principle of
marginality; they are maximally powerful if the higher-order term(s) to
which a particular term is marginal are 0.

Best,
 John

I did notice that this question gets asked several times and in
slightly different ways, and I think the lack of an interface that
represents an unbalanced design in the same way aov represents
balanced designs is why the question will probably keep getting asked
again.
I had mentioned nlme and lme4 because I saw in some of the discussions
that using those were recommended for working with unbalanced designs.
And specifying random effects with zero variance, for example, would
probably serve my purposes.
       I'd be surprised if nlme or lme4 changes what I wrote above.

       Hope this helps.
       Spencer

Thank you for your help.

Sincerely,

Krishna

On Sun, Apr 3, 2011 at 7:28 AM, John Fox<jfox at mcmaster.ca>  wrote:

Dear Krishna,

Although it's difficult to explain briefly, I'd argue that balanced
and unbalanced ANOVA are not fundamentally different, in that the
focus should be on the hypotheses that are tested, and these are
naturally expressed as functions of cell means and marginal means.
For example, in a two-way ANOVA, the null hypotheses of no
interaction is equivalent to parallel profiles of cell means for one
factor across levels of the other. What is different, though, is that
in a balanced ANOVA all common approaches to constructing an ANOVA
table coincide.

Without getting into the explanation in detail (which you can find in
a text like my Applied Regression Analysis and Generalized Linear
Models), so-called type-I (or sequential) tests, such as those
performed by the standard anova() function in R, test hypotheses that
are rarely of substantive interest, and, even when they are, are of
interest only by accident. So-called type-II tests, such as those
performed by default by the
Anova() function in the car package, test hypotheses that are almost
always of interest. Type-III tests, which the Anova() function in car
can perform optionally, require careful formulation of the model for
the hypotheses tested to be sensible, and even then have less power
than corresponding type-II tests in the circumstances in which a test
would be of interest.
Since you're addressing fixed-effects models, I'm not sure why you
introduced nlme and lme4 into the discussion, but I note that Anova()
in the car package has methods that can produce type-II and -III Wald
tests for the fixed effects in mixed models fit by lme() and lmer().

Your question has been asked several times before on the r-help list.
For example, if you enter terms like "type-II" or "unbalanced ANOVA"
in the RSeek search engine and look under the "Support Lists" tab,
you'll see many hits -- e.g.,
<Mhttps://stat.ethz.ch/pipermail/r-help/2006-August/111927.html>.

I hope this helps,
  John

--------------------------------
John Fox
Senator William McMaster
  Professor of Social Statistics
Department of Sociology
McMaster University
Hamilton, Ontario, Canada
http://socserv.mcmaster.ca/jfox

-----Original Message-----
From: r-help-bounces at r-project.org
[mailto:r-help-bounces at r-project.org]
On Behalf Of Krishna Kirti Das
Sent: April-03-11 3:25 AM
To: r-help at r-project.org
Subject: [R] Unbalanced Anova: What is the best approach?

I have a three-way unbalanced ANOVA that I need to calculate (fixed
effects plus interactions, no random effects). But word has it that
aov() is good only for balanced designs. I have seen a number of
different recommendations for working with unbalanced designs, but
they seem to differ widely (car, nlme, lme4, etc.). So I would like
to know what is
the
best or most usual way to go about working with unbalanced designs
and extracting a reliable ANOVA table from them in R?

       [[alternative HTML version deleted]]

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-
guide.html and provide commented, minimal, self-contained,
reproducible code.

	[[alternative HTML version deleted]]

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

--
Spencer Graves, PE, PhD
President and Chief Operating Officer
Structure Inspection and Monitoring, Inc.
751 Emerson Ct.
San Jos?, CA 95126
ph:  408-655-4567

Unbalanced Anova: What is the best approach?

Thread (8 messages)