Skip to content

Anova - adjusted or sequential sums of squares?

7 messages · michael watson (IAH-C), John Fox, Douglas Bates +4 more

#
I guess the real problem is this:

As I have a different number of observations in each of the groups, the
results *change* depending on which order I specify the factors in the
model.  This unnerves me.  With a completely balanced design, this
doesn't happen - the results are the same no matter which order I
specify the factors.  

It's this reason that I have been given for using the so-called type III
adjusted sums of squares...

Mick

-----Original Message-----
From: Douglas Bates [mailto:bates at stat.wisc.edu] 
Sent: 20 April 2005 15:07
To: michael watson (IAH-C)
Cc: r-help at stat.math.ethz.ch
Subject: Re: [R] Anova - adjusted or sequential sums of squares?
michael watson (IAH-C) wrote:
and lm() in R.
Install the fortunes package and try
 > fortune("Venables")

I'm really curious to know why the "two types" of sum of squares are
called "Type I" and "Type III"! This is a very common misconception,
particularly among SAS users who have been fed this nonsense quite often
for all their professional lives. Fortunately the reality is much
simpler. There is, 
by any
sensible reckoning, only ONE type of sum of squares, and it always 
represents
an improvement sum of squares of the outer (or alternative) model over
the inner (or null hypothesis) model. What the SAS highly dubious 
classification of
sums of squares does is to encourage users to concentrate on the null
hypothesis model and to forget about the alternative. This is always a 
very bad
idea and not surprisingly it can lead to nonsensical tests, as in the 
test it
provides for main effects "even in the presence of interactions",
something which beggars definition, let alone belief.
    -- Bill Venables
       R-help (November 2000)

In the words of the master, "there is ... only one type of sum of 
squares", which is the one that R reports.  The others are awkward 
fictions created for times when one could only afford to fit one or two 
linear models per week and therefore wanted the output to give results 
for all possible tests one could conceive, even if the models being 
tested didn't make sense.
#
Dear Mick,

The Anova() function in the car package will compute what are often called
"type-II" and "-III" sums of squares. 

Without wishing to reinvigorate the sums-of-squares controversy, I'll just
add that the various "types" of sums of squares correspond to different
hypotheses. The hypotheses tested by "type-I" sums of squares are rarely
sensible; that the results vary by the order of the factors is a symptom of
this, but sums of squares that are invariant with respect to ordering of the
factors don't necessarily correspond to sensible hypotheses. 

If you do decide to use "type-III" sums of squares, be careful to use a
contrast type (such as contr.sum) that produces an orthogonal row basis for
the terms in the model.

I hope this helps,
 John

--------------------------------
John Fox
Department of Sociology
McMaster University
Hamilton, Ontario
Canada L8S 4M4
905-525-9140x23604
http://socserv.mcmaster.ca/jfox 
--------------------------------
#
Michael Watson (IAH-C) wrote:
Yes, but how do you know that the test represented by the type III sums 
of squares makes sense in the context of your data?

The point that Bill is trying to make is that hypothesis tests always 
involve comparing the fits of two nested models to some data.  If you 
can't describe what the models being compared are, how can you interpret 
the results of the test?

There are many concepts introduced in statistics that apply to certain, 
specific cases but have managed to outgrow the original context so that 
people think they have general application.  The area of linear models 
and the analysis of variance is overrun with such concepts.  The list 
includes "significance of main effects", "overall mean", "R^2", 
"expected mean square", ...   These concepts do not stand on their own - 
they apply to specific models.

You are looking for *the answer* to the question "Is this main effect 
significant?"   What Bill is saying is that the question doesn't make 
sense.  You can ask "Does the model that incorporates this term and all 
other first-order terms provide a significantly better fit than same 
model without this one term?"  That's a well-phrased question.  You 
could even ask the same question about a model with first-order and 
higher-order terms versus the same model without this one term and you 
can get an answer to that question.  Whether or not that answer makes 
sense depends on whether or not the model with all the terms except the 
one being considered makes sense.  In most cases it doesn't so why say 
that you must get a p-value for a nonsensical test.  That number does 
*not* characterize a test of the "significance of the main effect".

How does R come in to this?  Well, with R you can fit models of great 
complexity to reasonably large data sets quickly and easily so, if you 
can formulate the hypothesis of interest to you by comparing the fit of 
an inner model to an outer model, then you simply fit them and compare 
them, usually with anova(fm1, fm2).  That's it.  That's all that the 
analysis of variance is about.  The complicated formulas and convoluted 
reasoning that we were all taught are there solely for the purpose of 
trying to "simplify" the calculations for this comparison.   They're 
unnecessary.  With a tool like R you simple fit model 1 then fit model 2 
  and compare the fits.  The only kicker is that you have to be able to 
describe your hypothesis in terms of the difference between two models.

With tools like R we have the potential to change statistics is viewed 
as a discipline and especially the way that it is taught.  Statistics is 
not about formulas - statistics is about models.  R allows you to think 
about the models and not grubby details of the calculations.
#
"michael watson (IAH-C)" <michael.watson at bbsrc.ac.uk> writes:
...and that is completely wrong!

If there ever is a reason for using Type III SSDs, it should be that
the results do not really depend "very much" on the order. This is
conceivably the case in "nearly balanced" designs. (I.e. it can be
viewed as an attempt to regain the nice property of balanced designs
where you can read everything off of the ANOVA table.)

If the unbalance is severe, then results simply *are* dependent on
which other factors are in the model - the effect of weight diminishes
when height is added to a model, etc. It's a fact of life and you just
have to deal with it.
#
On Wed, 20 Apr 2005, michael watson (IAH-C) wrote:

            
This is one of many examples of an attempt to provide a mathematical 
answer to something that isn't a mathematical question.

As people have already pointed out, in any practical testing situation you 
have two models you want to compare.  If you are working in an interactive 
statistical environment, or even in a modern batch-mode system, you can 
fit the two models and compare them.  If you want to compare two other 
models, you can fit them and compare them.

However, in the Bad Old Days this was inconvenient (or so I'm told).  If 
you had half a dozen tests, and one of the models was the same in each 
test, it was a substantial saving of time and effort to fit this model 
just once.

This led to a system where you specify a model and a set of tests: eg I'm 
going to fit y~a+b+c+d and I want to test (some of) y~a vs y~a+b, y~a+b vs 
y~a+b+c and so on. Or, I want to test (some of) y~a+b+c vs y~a+b+c+d, 
y~a+b+d vs y~a+b+c+d and so on. This gives the "Types" of sums of squares, 
which are ways of specifying sets of tests. You could pick the "Type" so 
that the total number of linear models you had to fit was minimized. As 
these are merely a computational optimization, they don't have to make any 
real sense. Unfortunately, as with many optimizations, they have gained a 
life of their own.

The "Type III" sums of squares are the same regardless of order, but this 
is a bad property, not a good one. The question you are asking when 
you test "for" a term X really does depend on what other terms are in the 
model, so order really does matter.  However, since you can do anything 
just by specifying two models and comparing them, you don't actually need 
to worry about any of this.

 	-thomas
#
On 20-Apr-05 michael watson \(IAH-C\) wrote:
This is inevitable. It's not for nothing that unbalanced "designs"
are called "non-orthogonal".

What are the "E" and "NE" effects corresponding to the
observations plotted at "+" in the following diagram?


      NE
      /
     /
    /
   /
  /   +
 /
o------------------>E

Best wishes,
Ted.


--------------------------------------------------------------------
E-Mail: (Ted Harding) <Ted.Harding at nessie.mcc.ac.uk>
Fax-to-email: +44 (0)870 094 0861
Date: 20-Apr-05                                       Time: 16:54:21
------------------------------ XFMail ------------------------------
#
Folks:

At the great risk of having my ignorance publicly displayed, let me say:

1) I enjoyed and learned from the discussion;

2) I especially enjoyed WNV's paper linked by BDR -- I enjoyed both the
wisdom of the content and the elegance and humor of style. Good writing is a
rare gift.

Anyway, I would like to add what I hope is just a bit of useful perspective
on WNV's comment in his paper that:(p.14) "... there are some very special
occasions where some clearly defined estimable function of the parameters
that would qualify as a definition of a main effect to be tested, even when
there is an interaction in place, but like the regression through the origin
case, such instances are extremely rare and special."

Well, maybe not so rare and special: Consider a two factor model with one
factor representing, say process type and the other, say, type of raw
materials. The situation is that we have several different process types
each of which can use one of the several sources of raw materials. We are
interested in seeing whether the sources of raw materials can be used
interchangeably for the different processes. We are interested both in the
issue of whether the sources of raw materials are assoicated with some
consistent effect over **all ** processes and also in the more likely issue
of whether only some processes might be sensitive and others not. This
latter issue can be explored -- with the caveats expressed in this
discussion -- by testing for interactions in a simple 2-way model. However,
it seems to me to be both reasonable and of interest to test for the main
effect term given the interactions. This expresses the view that the
interactions are in fact more likely than main effects; i.e. one expects
perhaps a few of the processes to be sensitive in different ways, but not
most of them and not in a consistent direction. I think that this is, in
fact, not so uncommon a situation in many different contexts.

Of course, whether under imbalance one can actually test a hypothesis that
meaningfully expresses this notion is another story ...

As always, I would appreciate other perspectives and corrections, either on
list or privately.

-- Bert Gunter
Genentech Non-Clinical Statistics
South San Francisco, CA
 
"The business of the statistician is to catalyze the scientific learning
process."  - George E. P. Box