-----Original Message-----
From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-
project.org] On Behalf Of Tal Galili
Sent: Sunday, February 15, 2009 3:16 AM
To: John Fox
Cc: r-help at r-project.org; Michael Friendly; Nils Skotara; Peter
Dalgaard
Subject: Re: [R] Anova and unbalanced designs
Dear John - thank you for your detailed answer and help.
Your answer encourages me to ask further: by choosing different
contrasts,
what are the different hypothesis which are being tested? (or put
differently - should I prefer contr.sum over contr.poly or
contr.helmert,
or does this makes no difference ?)
How should this question be approached/answered ?
I see in the ?contrasts in R that the referenced reading is:
"Chambers, J. M. and Hastie, T. J. (1992) *Statistical models.* Chapter
2
of *Statistical Models in S* eds J. M. Chambers and T. J. Hastie,
Wadsworth
& Brooks/Cole."
Yet I must admit I don't have this book readily available (not on the
web,
nor in my local library), so other recommended sources would be of
great
help.
For future reference I add here a some tinkering of the code to show
how
implementing different contrasts will resort in different SS type III
analysis results:
phase <- factor(rep(c("pretest", "posttest", "followup"), c(5, 5, 5)),
levels=c("pretest", "posttest", "followup"))
hour <- ordered(rep(1:5, 3))
idata <- data.frame(phase, hour)
contrasted.treatment <- C(OBrienKaiser$treatment, "contr.treatment")
mod.ok.contr.treatment <- lm(cbind(pre.1, pre.2, pre.3, pre.4, pre.5,
post.1, post.2, post.3, post.4, post.5,
fup.1, fup.2, fup.3, fup.4, fup.5) ~
contrasted.treatment*gender, data=OBrienKaiser)
contrasted.treatment <- C(OBrienKaiser$treatment, "contr.helmert")
mod.ok.contr.helmert <- lm(cbind(pre.1, pre.2, pre.3, pre.4, pre.5,
post.1, post.2, post.3, post.4, post.5,
fup.1, fup.2, fup.3, fup.4, fup.5) ~
contrasted.treatment*gender, data=OBrienKaiser)
contrasted.treatment <- C(OBrienKaiser$treatment, "contr.poly")
mod.ok.contr.poly <- lm(cbind(pre.1, pre.2, pre.3, pre.4, pre.5,
post.1, post.2, post.3, post.4, post.5,
fup.1, fup.2, fup.3, fup.4, fup.5) ~
contrasted.treatment*gender, data=OBrienKaiser)
contrasted.treatment <- C(OBrienKaiser$treatment, "contr.sum")
mod.ok.contr.sum <- lm(cbind(pre.1, pre.2, pre.3, pre.4, pre.5,
post.1, post.2, post.3, post.4, post.5,
fup.1, fup.2, fup.3, fup.4, fup.5) ~
contrasted.treatment*gender, data=OBrienKaiser)
# this is one result:
(Anova(mod.ok.contr.treatment, idata=idata, idesign=~phase*hour, type =
"III"))
# all of the other contrasts will now give the same outcome: (does
that
mean there shouldn't be a preference of using one over the other ?)
(Anova(mod.ok.contr.helmert, idata=idata, idesign=~phase*hour, type =
"III"))
(Anova(mod.ok.contr.poly, idata=idata, idesign=~phase*hour, type =
"III"))
(Anova(mod.ok.contr.sum, idata=idata, idesign=~phase*hour, type =
"III"))
With regards,
Tal
On Sat, Feb 14, 2009 at 7:09 PM, John Fox <jfox at mcmaster.ca> wrote:
-----Original Message-----
From: Tal Galili [mailto:tal.galili at gmail.com]
Sent: February-14-09 10:23 AM
To: John Fox
Cc: Peter Dalgaard; Nils Skotara; r-help at r-project.org; Michael
Subject: Re: [R] Anova and unbalanced designs
Hello John and other R mailing list members.
I've been following your discussions regarding the Anova command
type 2/3 repeated measures Anova, and I have a question:
I found that when I go from using type II to using type III, the
model is suddenly added with an "intercept" term (example in the
e-mail). So my question is
1) why is this "intercept" term added (in SS type "III" vs the type
"II")?
The computational approach taken in Anova() makes it simpler to
intercept in the "type-III" tests and not to include it in the "type-
2) Can/should this "intercept" term be removed ? (or how should it
The test for the intercept is rarely of interest. A "type-II" test
intercept would test that the unconditional mean of the response is
"type-III" test for the intercept would test that the constant term
full model fit to the data is 0. The latter depends upon the
parametrization
of the model (in the case of an ANOVA model, what kind of "contrasts"
used). You state that the example that you give is taken from ?Anova
there's a crucial detail that's omitted: The help file only gives the
"type-II" tests; the "type-III" tests are also reasonable here, but
depend upon having used "contr.sum" (or another set of contrasts
orthogonal in the row basis of the model matrix) for the between-
factors, treatment and gender. This detail is in the data set:
[1] M M M F F M M F F M M M F F F F
attr(,"contrasts")
[1] contr.sum
Levels: F M
[1] control control control control control A A A
B B
[12] B B B B B
attr(,"contrasts")
[,1] [,2]
control -2 0
A 1 -1
B 1 1
Levels: control A B
With proper contrast coding, the "type-III" test for the intercept
that the mean of the cell means (the "grand mean") is 0.
Had the default dummy-coded contrasts (from contr.treatment) been
tests would not have tested reasonable hypotheses. My advice, from
file: "Be very careful in formulating the model for type-III tests,
hypotheses tested will not make sense."
I hope this helps,
John
My purpose is to be able to use the Anova for analyzing an
2 between and 3 within factors, where the between factors are not
and the within factors are (that is why I can't use the aov
#---code start
#---code start
#---code start
# (taken from the ?Anova help file)
phase <- factor(rep(c("pretest", "posttest", "followup"), c(5, 5,
levels=c("pretest", "posttest", "followup"))
hour <- ordered(rep(1:5, 3))
idata <- data.frame(phase, hour)
idata
mod.ok <- lm(cbind(pre.1, pre.2, pre.3, pre.4, pre.5,
post.1, post.2, post.3, post.4, post.5,
fup.1, fup.2, fup.3, fup.4, fup.5) ~
data=OBrienKaiser)
# now we have two options
# option one is to use type II:
(av.ok <- Anova(mod.ok, idata=idata, idesign=~phase*hour, type =
#output:
Type II Repeated Measures MANOVA Tests: Pillai test statistic
Df test stat approx F num Df den Df
treatment 2 0.4809 4.6323 2 10
gender 1 0.2036 2.5558 1 10
treatment:gender 2 0.3635 2.8555 2 10
phase 1 0.8505 25.6053 2 9
treatment:phase 2 0.6852 2.6056 4 20
gender:phase 1 0.0431 0.2029 2 9
treatment:gender:phase 2 0.3106 0.9193 4 20
hour 1 0.9347 25.0401 4 7
treatment:hour 2 0.3014 0.3549 8 16
gender:hour 1 0.2927 0.7243 4 7
treatment:gender:hour 2 0.5702 0.7976 8 16
phase:hour 1 0.5496 0.4576 8 3
treatment:phase:hour 2 0.6637 0.2483 16 8
gender:phase:hour 1 0.6950 0.8547 8 3
treatment:gender:phase:hour 2 0.7928 0.3283 16 8
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
# option two is to use type III, and then get an added intercept
(av.ok <- Anova(mod.ok, idata=idata, idesign=~phase*hour, type =
# here is the output:
Type III Repeated Measures MANOVA Tests: Pillai test statistic
Df test stat approx F num Df den Df
(Intercept) 1 0.967 296.389 1 10
treatment 2 0.441 3.940 2 10
gender 1 0.268 3.659 1 10
treatment:gender 2 0.364 2.855 2 10
treatment:phase 2 0.696 2.670 4 20
gender:phase 1 0.066 0.319 2 9
treatment:gender:phase 2 0.311 0.919 4 20
treatment:hour 2 0.316 0.376 8 16
gender:hour 1 0.339 0.898 4 7
treatment:gender:hour 2 0.570 0.798 8 16
phase:hour 1 0.560 0.478 8 3
treatment:phase:hour 2 0.662 0.248 16 8
gender:phase:hour 1 0.712 0.925 8 3
treatment:gender:phase:hour 2 0.793 0.328 16 8
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#---code end
#---code end
#---code end
Thanks in advance for your help!
Tal Galili
On Sun, Jan 25, 2009 at 3:08 AM, John Fox <jfox at mcmaster.ca> wrote:
Dear Peter and Nils,
In my initial message, I stated misleadingly that the
didn't
matter for the "type-III" tests here since there is just one
between-subjects factor, but that's not right: The between
is
correct using contr.treatment(), but the within SS is not. As
generally
the case, to get reasonable type-III tests (i.e., tests of
hypotheses), it's necessary to have contrasts that are
the
row-basis of the design, such as contr.sum(),
contr.poly(). The "type-II" tests, however, are insensitive
contrast
parametrization. Anova() always uses an orthogonal
the
within-subjects design.
The general advice in ?Anova is, "Be very careful in
model
for type-III tests, or the hypotheses tested will not make
Thanks, Peter, for pointing this out.
John
------------------------------
John Fox, Professor
Department of Sociology
McMaster University
Hamilton, Ontario, Canada
web: socserv.mcmaster.ca/jfox
> -----Original Message-----
> From: Peter Dalgaard [mailto:p.dalgaard at biostat.ku.dk]
> Sent: January-24-09 6:31 PM
> To: Nils Skotara
> Cc: John Fox; r-help at r-project.org; 'Michael Friendly'
> Subject: Re: [R] Anova and unbalanced designs
>
> > Dear John,
> >
> > thank you again! You replicated the type III result I got
> > calculate Anova() type II:
> >
> > Univariate Type II Repeated-Measures ANOVA Assuming
> >
> > SS num Df Error SS den Df F
> > between 4.8000 1 9.0000 8 4.2667
> > within 0.2000 1 10.6667 8 0.1500
> > between:within 2.1333 1 10.6667 8 1.6000
> > ---
> > Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1
> >
> > I see the exact same values as you had written.
> > However, and now I am really lost, type III (I did not
> > leads to the following:
> >
> > Univariate Type III Repeated-Measures ANOVA Assuming
> >
> > SS num Df Error SS den Df
> > (Intercept) 72.000 1 9.000 8
> > between 4.800 1 9.000 8
> > as.factor(within) 2.000 1 10.667 8
> > between:as.factor(within) 2.133 1 10.667 8
> > ---
> > Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1
> >
> > How is this possible?
>
> This looks like a contrast parametrization issue: If we
> per-group mean within-differences and their SE, we get
>
> > summary(lm(within1-within2~between - 1))
> ..
> Coefficients:
> Estimate Std. Error t value Pr(>|t|)
> between1 -1.0000 0.8165 -1.225 0.256
> between2 0.3333 0.6667 0.500 0.631
> ..
> between
> 1 2
> 4 6
>
> Now, the type II F test is based on weighting the two means
> after testing for no interaction
>
> > (4*-1+6*.3333)^2/(4^2*0.8165^2+6^2*0.6667^2)
> [1] 0.1500205
>
> and type III is to weight them as if there had been equal
> > (5*-1+5*.3333)^2/(5^2*0.8165^2+5^2*0.6667^2)
> [1] 0.400022
>
> However, the result above corresponds to looking at group1
> [1] 1.499987
>
> It helps if you choose orhtogonal contrast
> > options(contrasts=c("contr.sum","contr.helmert"))
> > betweenanova <- lm(values ~ between)>
> idesign= ~as.factor(within), type = "III" )
>
> Type III Repeated Measures MANOVA Tests: Pillai test
> Df test stat approx F num Df den
> (Intercept) 1 0.963 209.067 1
> between 1 0.348 4.267 1
> as.factor(within) 1 0.048 0.400 1
> between:as.factor(within) 1 0.167 1.600 1
> ---
> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 '
>
>
>
>
> --
> O__ ---- Peter Dalgaard ?ster Farimagsgade
> c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014
> (*) \(*) -- University of Copenhagen Denmark Ph:
> ~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk) FAX: