Skip to content

question about anova() output

4 messages · Robert Burrows, Thomas Lumley, Peter Dalgaard

#
Hello,

I am getting output from anova() and summary(aov()) that depends on the
order of the factors in the fitted model object, and this has me baffled. I
see this dependency with the data.frame below but not with an example (table
6.4) from Montgomery's DOE book. This is with R 1.3.0 on Debian GNU-Linux.

Where have I gone wrong?
run sample CH50mg
1  day1 dev126   0.56
2  day1 dev126   0.70
3  day1 dev126   0.82
4  day1 dev126   0.72
5  day2 dev126   0.57
6  day2 dev126   0.60
7  day3 dev126   0.61
8  day3 dev126   0.64
9  day3 dev126   0.68
10 day3 dev126   0.68
11 day1 dev118   0.77
12 day1 dev118   0.80
13 day1 dev118   0.86
14 day2 dev118   0.71
15 day2 dev118   0.70
16 day3 dev118   0.77
17 day3 dev118   0.77
18 day3 dev118   0.77
19 day3 dev118   0.80
20 day1 rgf108   0.77
21 day1 rgf108   0.86
22 day1 rgf108   0.82
23 day2 rgf108   0.62
24 day2 rgf108   0.63
25 day3 rgf108   0.66
26 day3 rgf108   0.71
27 day3 rgf108   0.69
28 day3 rgf108   0.69
Analysis of Variance Table

Response: CH50mg
           Df   Sum Sq  Mean Sq F value    Pr(>F)
run         2 0.064308 0.032154 12.5597 0.0003343
sample      2 0.068649 0.034324 13.4075 0.0002337
run:sample  4 0.010444 0.002611  1.0199 0.4221699
Residuals  19 0.048642 0.002560
Analysis of Variance Table

Response: CH50mg
           Df   Sum Sq  Mean Sq F value    Pr(>F)
sample      2 0.061927 0.030964 12.0948 0.0004093
run         2 0.071029 0.035515 13.8725 0.0001931
sample:run  4 0.010444 0.002611  1.0199 0.4221699
Residuals  19 0.048642 0.002560

TIA,
#
On Fri, 26 Oct 2001, Robert Burrows wrote:

            
In worrying about it?

In a non-orthogonal design (ie most unbalanced designs) the sums of
squares do depend on the order. In an orthogonal design they don't. This
is because R uses sums of squares that are projections involving a nested
sequence of models.

Some packages report sums of squares that are based on comparing the full
model to the models with each factor removed one at a time.  The question
of which set of sums of squares is the Right Thing provokes low-level holy
wars on r-help from time to time.

You can compute sums of squares comparing any two models you feel like by
using
  anova(model1,model2)


This probably should be a FAQ


	-thomas

Thomas Lumley			Asst. Professor, Biostatistics
tlumley at u.washington.edu	University of Washington, Seattle

-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
#
Robert Burrows <rbb at nebiometrics.com> writes:
In assuming that the order should not matter. Anova() gives the
incremental SS, and in an non-orthogonal design the order *does*
matter. You might want to try 

drop1(lm(CH50mg~run+sample))

and also

anova(lm(CH50mg~run))
anova(lm(CH50mg~sample))

Also, if you remove one of the first four observations, you will get a
balanced design and the order-dependence should disappear.
BTW: The interaction operator is ":" ~run*sample expands to 
~sample+run+run:sample
#
Many thanks to TL and PD for your replies. I clearly need to learn a bit
more about anova().