Skip to content

aov for unbalanced design (PR#7144)

3 messages · tlogvinenko@partners.org, Thomas Lumley, Brian Ripley

#
Full_Name: Tanya Logvinenko
Version: 1.7.0
OS: Windows 2000
Submission from: (NULL) (132.183.156.125)


For unbalanced design, I ran into problem with ANOVA (aov function). The sum of
squares for only for the second factor and total are computed correctly, but sum
of squares for the first factor is computed incorreclty. Changing order of
factors in the formula changes the ANOVA table. For the balanced design, there
is no such problem.
Df  Sum Sq Mean Sq F value    Pr(>F)    
factor1      5 1524420  304884  6.4529 0.0003229 ***
factor2      7 1447830  206833  4.3776 0.0017808 ** 
Residuals   31 1464674   47248                      
---
Signif. codes:  0 `***' 0.001 `**' 0.01 `*' 0.05 `.' 0.1 ` ' 1
Df  Sum Sq Mean Sq F value    Pr(>F)    
factor2      7 1648225  235461  4.9836 0.0007295 ***
factor1      5 1324025  264805  5.6046 0.0008612 ***
Residuals   31 1464674   47248                      
---
Signif. codes:  0 `***' 0.001 `**' 0.01 `*' 0.05 `.' 0.1 ` ' 1
#
It is not a bug. It is supposed to be that way. It is even a FAQ.

	-thomas
On Thu, 29 Jul 2004 tlogvinenko@partners.org wrote:

            
Thomas Lumley			Assoc. Professor, Biostatistics
tlumley@u.washington.edu	University of Washington, Seattle
#
What do you think is the correct answer and on what authority?
(These are explicitly sequential aka Type 1 anova tables.)

That the SSqs depend on the order of fitting is a feature of an unbalanced 
design.  I believe that R is correct and your understanding is not.
On Thu, 29 Jul 2004 tlogvinenko@partners.org wrote:

            
Oh, please!  Don't send in bug reports from very old versions -- there 
have been 5 releases since then.
The FAQ has a section on BUGS asking for a *reproducible* example.  This 
is not.