Skip to content

ANOVA Output

5 messages · Katrina W. Chu, ONKELINX, Thierry, Tyler Smith +2 more

#
I have a question about my R-output when I run a three-way ANOVA.  I just plugged in the
interaction term into the formula and presto!  ANOVA!  But I noticed that if I change
the order of the formula (or interaction term), I get slightly different ANOVA outputs.
 I've pasted the output at the bottom of this message.  I didn't think that this should
happen, so I would appreciate if anyone had any feedback on this problem.

Thanks in advance, Kat.
Df  Sum Sq Mean Sq F value    Pr(>F)    
Treatment                       3   356.5   118.8  4.2878  0.005276 ** 
SamplingPeriod                  3   374.7   124.9  4.5069  0.003911 ** 
Site                            1  1016.5  1016.5 36.6791 2.629e-09 ***
Treatment:SamplingPeriod        9   467.6    52.0  1.8747  0.053284 .  
Treatment:Site                  3   167.8    55.9  2.0176  0.110424    
SamplingPeriod:Site             3  1670.2   556.7 20.0884 2.383e-12 ***
Treatment:SamplingPeriod:Site   9   277.2    30.8  1.1115  0.352455    
Residuals                     534 14799.5    27.7                      
---
Signif. codes:  0 ?***? 0.001 ?**? 0.01 ?*? 0.05 ?.? 0.1 ? ? 1
Df  Sum Sq Mean Sq F value    Pr(>F)    
SamplingPeriod                  3   369.5   123.2  4.4437  0.004264 ** 
Treatment                       3   361.8   120.6  4.3510  0.004840 ** 
Site                            1  1016.5  1016.5 36.6791 2.629e-09 ***
SamplingPeriod:Treatment        9   467.6    52.0  1.8747  0.053284 .  
SamplingPeriod:Site             3  1662.0   554.0 19.9894 2.718e-12 ***
Treatment:Site                  3   176.0    58.7  2.1166  0.097111 .  
SamplingPeriod:Treatment:Site   9   277.2    30.8  1.1115  0.352455    
Residuals                     534 14799.5    27.7                      
---
Signif. codes:  0 ?***? 0.001 ?**? 0.01 ?*? 0.05 ?.? 0.1 ? ? 1
Df  Sum Sq Mean Sq F value    Pr(>F)    
Site                            1  1008.9  1008.9 36.4050 2.998e-09 ***
SamplingPeriod                  3   374.1   124.7  4.4990  0.003953 ** 
Treatment                       3   364.8   121.6  4.3871  0.004607 ** 
Site:SamplingPeriod             3  1654.8   551.6 19.9026 3.050e-12 ***
Site:Treatment                  3   172.6    57.5  2.0761  0.102364    
SamplingPeriod:Treatment        9   478.2    53.1  1.9172  0.047282 *  
Site:SamplingPeriod:Treatment   9   277.2    30.8  1.1115  0.352455    
Residuals                     534 14799.5    27.7                      
---
Signif. codes:  0 ?***? 0.001 ?**? 0.01 ?*? 0.05 ?.? 0.1 ? ? 1
#
Dear Katrina,

The F-value are different because you test different hypotheses since
anova yields Type I SS. It looks like you expect Type III SS.

HTH,

Thierry


------------------------------------------------------------------------
----
ir. Thierry Onkelinx
Instituut voor natuur- en bosonderzoek / Research Institute for Nature
and Forest
Cel biometrie, methodologie en kwaliteitszorg / Section biometrics,
methodology and quality assurance
Gaverstraat 4
9500 Geraardsbergen
Belgium 
tel. + 32 54/436 185
Thierry.Onkelinx at inbo.be 
www.inbo.be 

To call in the statistician after the experiment is done may be no more
than asking him to perform a post-mortem examination: he may be able to
say what the experiment died of.
~ Sir Ronald Aylmer Fisher

The plural of anecdote is not data.
~ Roger Brinner

The combination of some data and an aching desire for an answer does not
ensure that a reasonable answer can be extracted from a given body of
data.
~ John Tukey

-----Oorspronkelijk bericht-----
Van: r-sig-ecology-bounces at r-project.org
[mailto:r-sig-ecology-bounces at r-project.org] Namens Katrina W. Chu
Verzonden: woensdag 12 november 2008 22:27
Aan: r-sig-ecology at r-project.org
Onderwerp: [R-sig-eco] ANOVA Output

I have a question about my R-output when I run a three-way ANOVA.  I
just plugged in the
interaction term into the formula and presto!  ANOVA!  But I noticed
that if I change
the order of the formula (or interaction term), I get slightly different
ANOVA outputs.
 I've pasted the output at the bottom of this message.  I didn't think
that this should
happen, so I would appreciate if anyone had any feedback on this
problem.

Thanks in advance, Kat.
Df  Sum Sq Mean Sq F value    Pr(>F)    
Treatment                       3   356.5   118.8  4.2878  0.005276 ** 
SamplingPeriod                  3   374.7   124.9  4.5069  0.003911 ** 
Site                            1  1016.5  1016.5 36.6791 2.629e-09 ***
Treatment:SamplingPeriod        9   467.6    52.0  1.8747  0.053284 .  
Treatment:Site                  3   167.8    55.9  2.0176  0.110424    
SamplingPeriod:Site             3  1670.2   556.7 20.0884 2.383e-12 ***
Treatment:SamplingPeriod:Site   9   277.2    30.8  1.1115  0.352455    
Residuals                     534 14799.5    27.7                      
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Df  Sum Sq Mean Sq F value    Pr(>F)    
SamplingPeriod                  3   369.5   123.2  4.4437  0.004264 ** 
Treatment                       3   361.8   120.6  4.3510  0.004840 ** 
Site                            1  1016.5  1016.5 36.6791 2.629e-09 ***
SamplingPeriod:Treatment        9   467.6    52.0  1.8747  0.053284 .  
SamplingPeriod:Site             3  1662.0   554.0 19.9894 2.718e-12 ***
Treatment:Site                  3   176.0    58.7  2.1166  0.097111 .  
SamplingPeriod:Treatment:Site   9   277.2    30.8  1.1115  0.352455    
Residuals                     534 14799.5    27.7                      
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Df  Sum Sq Mean Sq F value    Pr(>F)    
Site                            1  1008.9  1008.9 36.4050 2.998e-09 ***
SamplingPeriod                  3   374.1   124.7  4.4990  0.003953 ** 
Treatment                       3   364.8   121.6  4.3871  0.004607 ** 
Site:SamplingPeriod             3  1654.8   551.6 19.9026 3.050e-12 ***
Site:Treatment                  3   172.6    57.5  2.0761  0.102364    
SamplingPeriod:Treatment        9   478.2    53.1  1.9172  0.047282 *  
Site:SamplingPeriod:Treatment   9   277.2    30.8  1.1115  0.352455    
Residuals                     534 14799.5    27.7                      
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#
Apologies if I'm beating a dead horse here, but this is exactly the
problem I raised in the thread on classical statistics in R. If Katrina
is using a textbook like Sokal and Rohlf, it is indeed completely
unexpected to find that changing the order of explanatory variables in
an anova will produce different results. Thierry points out that this is
because R produces Type I SS by default. Unfortunately, nowhere in S&R
is this distinction explained, so for this problem a book widely
regarded as a comprehensive reference for biologists provides absolutely
no help.

These questions come up all the time on the r-help list, and I think
it's a sign of a real disconnect between the presentation of classical
statistics in many undergrad programs and the way the tests are actually
implemented in R.

Anyways, that's a bigger issue. It may be helpful to know that the 'car'
package includes a function Anova (not to be confused with the anova
function) that allows you to calculate type II or type III sums of
squares. 

Cheers,

Tyler

"ONKELINX, Thierry" <Thierry.ONKELINX at inbo.be>
writes:

  
    
#
That's a great point Tyler.  It raises the question of what IS a good reference for statistics that treats them the way R does.  There has been some discussion of that already, but one book that hasn't been mentioned is that of John Fox, the author of the car package.

Fox, John.  1997.  Applied regression analysis, linear models, and related methods.  Sage Publications.

http://books.google.com/books?id=pr2mKvAxXeYC&printsec=frontcover&lr=

Although mainly aimed at the social sciences, I found this to be pretty readable, and much more detailed than Crawley's books (admittedly aimed at a higher level).  And as for R code, Fox also has "An R and S-Plus Companion to Applied Regression". http://books.google.com/books?id=xWS8kgRjGcAC&printsec=frontcover&lr=

 If you want to get a detailed understanding of Anova and regression the way R sees them, I think this pair of books is nearly as good as it gets.

Matt

-----Original Message-----
From: r-sig-ecology-bounces at r-project.org [mailto:r-sig-ecology-bounces at r-project.org] On Behalf Of tyler
Sent: Thursday, November 13, 2008 8:52 AM
To: r-sig-ecology at r-project.org
Subject: Re: [R-sig-eco] ANOVA Output

Apologies if I'm beating a dead horse here, but this is exactly the
problem I raised in the thread on classical statistics in R. If Katrina
is using a textbook like Sokal and Rohlf, it is indeed completely
unexpected to find that changing the order of explanatory variables in
an anova will produce different results. Thierry points out that this is
because R produces Type I SS by default. Unfortunately, nowhere in S&R
is this distinction explained, so for this problem a book widely
regarded as a comprehensive reference for biologists provides absolutely
no help.

These questions come up all the time on the r-help list, and I think
it's a sign of a real disconnect between the presentation of classical
statistics in many undergrad programs and the way the tests are actually
implemented in R.

Anyways, that's a bigger issue. It may be helpful to know that the 'car'
package includes a function Anova (not to be confused with the anova
function) that allows you to calculate type II or type III sums of
squares.

Cheers,

Tyler

"ONKELINX, Thierry" <Thierry.ONKELINX at inbo.be>
writes:
--
What is wanted is not the will to believe, but the will to find out, which is
the exact opposite.                    --Bertrand Russell

_______________________________________________
R-sig-ecology mailing list
R-sig-ecology at r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
#
The types-of-sums-of-squares issue is FAQ 7.18 and you can find a
great deal of discussion in the R help lists.  You only need to choose
a 'type' if for some reason you need to efficiently produce a table
with the results of multiple hypothesis tests.  In general I think it
is better to think hard about exactly which hypotheses are of interest
and and then compare appropriately nested models to conduct the test
(via e.g., a LRT).  This is covered in many stats books, including
many with an R focus.  See the books by Venebles and Ripley, Harrell,
Faraway, Maindonald, etc...

Kingsford Jones

On Thu, Nov 13, 2008 at 9:33 AM, Landis, R Matthew
<rlandis at middlebury.edu> wrote: