An embedded and charset-unspecified text was scrubbed... Name: not available URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20090214/6e166c33/attachment-0001.pl>
Anova and unbalanced designs
5 messages · Tal Galili, John Fox, Greg Snow
Dear Tal,
-----Original Message----- From: Tal Galili [mailto:tal.galili at gmail.com] Sent: February-14-09 10:23 AM To: John Fox Cc: Peter Dalgaard; Nils Skotara; r-help at r-project.org; Michael Friendly Subject: Re: [R] Anova and unbalanced designs Hello John and other R mailing list members. I've been following your discussions regarding the Anova command for the
SS
type 2/3 repeated measures Anova, and I have a question: I found that when I go from using type II to using type III, the summary model is suddenly added with an "intercept" term (example in the end of
the
e-mail). So my question is 1) why is this "intercept" term added (in SS type "III" vs the type "II")?
The computational approach taken in Anova() makes it simpler to include the intercept in the "type-III" tests and not to include it in the "type-II" tests.
2) Can/should this "intercept" term be removed ? (or how should it be interpreted ?)
The test for the intercept is rarely of interest. A "type-II" test for the intercept would test that the unconditional mean of the response is 0; a "type-III" test for the intercept would test that the constant term in the full model fit to the data is 0. The latter depends upon the parametrization of the model (in the case of an ANOVA model, what kind of "contrasts" are used). You state that the example that you give is taken from ?Anova but there's a crucial detail that's omitted: The help file only gives the "type-II" tests; the "type-III" tests are also reasonable here, but they depend upon having used "contr.sum" (or another set of contrasts that's orthogonal in the row basis of the model matrix) for the between-subject factors, treatment and gender. This detail is in the data set:
OBrienKaiser$gender
[1] M M M F F M M F F M M M F F F F attr(,"contrasts") [1] contr.sum Levels: F M
OBrienKaiser$treatment
[1] control control control control control A A A A
B B
[12] B B B B B
attr(,"contrasts")
[,1] [,2]
control -2 0
A 1 -1
B 1 1
Levels: control A B
With proper contrast coding, the "type-III" test for the intercept tests
that the mean of the cell means (the "grand mean") is 0.
Had the default dummy-coded contrasts (from contr.treatment) been used, the
tests would not have tested reasonable hypotheses. My advice, from the help
file: "Be very careful in formulating the model for type-III tests, or the
hypotheses tested will not make sense."
I hope this helps,
John
My purpose is to be able to use the Anova for analyzing an experiment with
a
2 between and 3 within factors, where the between factors are not
balanced,
and the within factors are (that is why I can't use the aov command).
#---code start
#---code start
#---code start
# (taken from the ?Anova help file)
phase <- factor(rep(c("pretest", "posttest", "followup"), c(5, 5, 5)),
levels=c("pretest", "posttest", "followup"))
hour <- ordered(rep(1:5, 3))
idata <- data.frame(phase, hour)
idata
mod.ok <- lm(cbind(pre.1, pre.2, pre.3, pre.4, pre.5,
post.1, post.2, post.3, post.4, post.5,
fup.1, fup.2, fup.3, fup.4, fup.5) ~
treatment*gender,
data=OBrienKaiser)
# now we have two options
# option one is to use type II:
(av.ok <- Anova(mod.ok, idata=idata, idesign=~phase*hour, type = "II"))
#output:
Type II Repeated Measures MANOVA Tests: Pillai test statistic
Df test stat approx F num Df den Df Pr(>F)
treatment 2 0.4809 4.6323 2 10 0.0376868
*
gender 1 0.2036 2.5558 1 10 0.1409735 treatment:gender 2 0.3635 2.8555 2 10 0.1044692 phase 1 0.8505 25.6053 2 9 0.0001930
***
treatment:phase 2 0.6852 2.6056 4 20 0.0667354
.
gender:phase 1 0.0431 0.2029 2 9 0.8199968 treatment:gender:phase 2 0.3106 0.9193 4 20 0.4721498 hour 1 0.9347 25.0401 4 7 0.0003043
***
treatment:hour 2 0.3014 0.3549 8 16 0.9295212
gender:hour 1 0.2927 0.7243 4 7 0.6023742
treatment:gender:hour 2 0.5702 0.7976 8 16 0.6131884
phase:hour 1 0.5496 0.4576 8 3 0.8324517
treatment:phase:hour 2 0.6637 0.2483 16 8 0.9914415
gender:phase:hour 1 0.6950 0.8547 8 3 0.6202076
treatment:gender:phase:hour 2 0.7928 0.3283 16 8 0.9723693
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
# option two is to use type III, and then get an added intercept term:
(av.ok <- Anova(mod.ok, idata=idata, idesign=~phase*hour, type = "III"))
# here is the output:
Type III Repeated Measures MANOVA Tests: Pillai test statistic
Df test stat approx F num Df den Df Pr(>F)
(Intercept) 1 0.967 296.389 1 10 9.241e-09
***
treatment 2 0.441 3.940 2 10 0.0547069
.
gender 1 0.268 3.659 1 10 0.0848003
.
treatment:gender 2 0.364 2.855 2 10 0.1044692 phase 1 0.814 19.645 2 9 0.0005208
***
treatment:phase 2 0.696 2.670 4 20 0.0621085
.
gender:phase 1 0.066 0.319 2 9 0.7349696 treatment:gender:phase 2 0.311 0.919 4 20 0.4721498 hour 1 0.933 24.315 4 7 0.0003345
***
treatment:hour 2 0.316 0.376 8 16 0.9183275 gender:hour 1 0.339 0.898 4 7 0.5129764 treatment:gender:hour 2 0.570 0.798 8 16 0.6131884 phase:hour 1 0.560 0.478 8 3 0.8202673 treatment:phase:hour 2 0.662 0.248 16 8 0.9915531 gender:phase:hour 1 0.712 0.925 8 3 0.5894907 treatment:gender:phase:hour 2 0.793 0.328 16 8 0.9723693 --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 #---code end #---code end #---code end Thanks in advance for your help! Tal Galili On Sun, Jan 25, 2009 at 3:08 AM, John Fox <jfox at mcmaster.ca> wrote: Dear Peter and Nils, In my initial message, I stated misleadingly that the contrast
coding
didn't matter for the "type-III" tests here since there is just one between-subjects factor, but that's not right: The between type-III
SS
is correct using contr.treatment(), but the within SS is not. As is generally the case, to get reasonable type-III tests (i.e., tests of
reasonable
hypotheses), it's necessary to have contrasts that are orthogonal in the row-basis of the design, such as contr.sum(), contr.helmert(), or contr.poly(). The "type-II" tests, however, are insensitive to the contrast parametrization. Anova() always uses an orthogonal parametrization
for
the within-subjects design. The general advice in ?Anova is, "Be very careful in formulating the model for type-III tests, or the hypotheses tested will not make sense." Thanks, Peter, for pointing this out. John ------------------------------ John Fox, Professor Department of Sociology McMaster University Hamilton, Ontario, Canada web: socserv.mcmaster.ca/jfox
> -----Original Message-----
> From: Peter Dalgaard [mailto:p.dalgaard at biostat.ku.dk] > Sent: January-24-09 6:31 PM > To: Nils Skotara > Cc: John Fox; r-help at r-project.org; 'Michael Friendly' > Subject: Re: [R] Anova and unbalanced designs >
> Nils Skotara wrote:
> > Dear John, > > > > thank you again! You replicated the type III result I got in
SPSS!
When I
> > calculate Anova() type II: > > > > Univariate Type II Repeated-Measures ANOVA Assuming Sphericity > > > > SS num Df Error SS den Df F Pr(>F) > > between 4.8000 1 9.0000 8 4.2667 0.07273 . > > within 0.2000 1 10.6667 8 0.1500 0.70864 > > between:within 2.1333 1 10.6667 8 1.6000 0.24150 > > --- > > Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 > > > > I see the exact same values as you had written. > > However, and now I am really lost, type III (I did not change
anything
> else)
> > leads to the following: > > > > Univariate Type III Repeated-Measures ANOVA Assuming Sphericity > > > > SS num Df Error SS den Df F
Pr(>F)
> > (Intercept) 72.000 1 9.000 8 64.0000
4.367e-05
> ***
> > between 4.800 1 9.000 8 4.2667
0.07273 .
> > as.factor(within) 2.000 1 10.667 8 1.5000
0.25551
> > between:as.factor(within) 2.133 1 10.667 8 1.6000
0.24150
> > --- > > Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 > > > > How is this possible?
> > This looks like a contrast parametrization issue: If we look at
the
> per-group mean within-differences and their SE, we get >
> > summary(lm(within1-within2~between - 1))
> .. > Coefficients: > Estimate Std. Error t value Pr(>|t|) > between1 -1.0000 0.8165 -1.225 0.256 > between2 0.3333 0.6667 0.500 0.631 > ..
> > table(between)
> between > 1 2 > 4 6 > > Now, the type II F test is based on weighting the two means as you
would
> after testing for no interaction >
> > (4*-1+6*.3333)^2/(4^2*0.8165^2+6^2*0.6667^2)
> [1] 0.1500205 > > and type III is to weight them as if there had been equal counts >
> > (5*-1+5*.3333)^2/(5^2*0.8165^2+5^2*0.6667^2)
> [1] 0.400022 > > However, the result above corresponds to looking at group1 only >
> > (-1)^2/(0.8165^2)
> [1] 1.499987 > > It helps if you choose orhtogonal contrast parametrizations: >
> > options(contrasts=c("contr.sum","contr.helmert"))
> > betweenanova <- lm(values ~ between)> Anova(betweenanova,
idata=with,
> idesign= ~as.factor(within), type = "III" ) > > Type III Repeated Measures MANOVA Tests: Pillai test statistic > Df test stat approx F num Df den Df
Pr(>F)
> (Intercept) 1 0.963 209.067 1 8
5.121e-
07 ***
> between 1 0.348 4.267 1 8
0.07273 .
> as.factor(within) 1 0.048 0.400 1 8
0.54474
> between:as.factor(within) 1 0.167 1.600 1 8
0.24150
> --- > Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 > > > > > -- > O__ ---- Peter Dalgaard ?ster Farimagsgade 5,
Entr.B
> c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014 Cph. K > (*) \(*) -- University of Copenhagen Denmark Ph: (+45)
35327918
> ~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk) FAX: (+45)
35327907
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting- guide.html and provide commented, minimal, self-contained, reproducible code. -- ---------------------------------------------- My contact information: Tal Galili Phone number: 972-50-3373767 FaceBook: Tal Galili My Blogs: www.talgalili.com www.biostatistics.co.il
An embedded and charset-unspecified text was scrubbed... Name: not available URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20090215/bc138025/attachment-0001.pl>
Dear Tal, A complete explanation of this issue is too long for an email; I do address it in my Applied Regression Analysis and Generalized Linear Models text. The question seems to come up so frequently that Georges Monette and I are writing a paper on it (and related issues) for the upcoming useR conference. Briefly, any set of "contrasts" that are orthogonal in the row basis of the model matrix (essentially, composed from what you see when you use the contrasts() function in R) will produce the same sums of squares (or, in the multivariate case, sums of squares and products). This include Hermert (contr.helmert), sigma-constrained (contr.sum), and orthogonal-polynomial (contr.poly) contrasts, but not dummy-coded "contrasts" (contr.treatment). (Actually, if you look carefully, you'll see that the contrasts defined for treatment in the OBrienKaiser data are custom orthogonal contrasts similar to Helmert contrasts.) Consequently, if all you're concerned with is the ANOVA table, it doesn't matter which of these you use. If, however, you're interested in the individual contrasts, it does of course matter which you use, and in particular the orthogonal polynomial contrasts are not sensible if the levels of the factor aren't ordered. Regards, John ------------------------------ John Fox, Professor Department of Sociology McMaster University Hamilton, Ontario, Canada web: socserv.mcmaster.ca/jfox
-----Original Message----- From: Tal Galili [mailto:tal.galili at gmail.com] Sent: February-15-09 5:16 AM To: John Fox Cc: Peter Dalgaard; Nils Skotara; r-help at r-project.org; Michael Friendly Subject: Re: [R] Anova and unbalanced designs Dear John - thank you for your detailed answer and help. Your answer encourages me to ask further: by choosing different contrasts, what are the different hypothesis which are being tested? (or put
differently
- should I prefer contr.sum over contr.poly or contr.helmert, or does
this
makes no difference ?)
How should this question be approached/answered ?
I see in the ?contrasts in R that the referenced reading is:
"Chambers, J. M. and Hastie, T. J. (1992) Statistical models. Chapter 2 of
Statistical Models in S eds J. M. Chambers and T. J. Hastie, Wadsworth &
Brooks/Cole."
Yet I must admit I don't have this book readily available (not on the web,
nor in my local library), so other recommended sources would be of great
help.
For future reference I add here a some tinkering of the code to show how
implementing different contrasts will resort in different SS type III
analysis results:
phase <- factor(rep(c("pretest", "posttest", "followup"), c(5, 5, 5)),
levels=c("pretest", "posttest", "followup"))
hour <- ordered(rep(1:5, 3))
idata <- data.frame(phase, hour)
contrasted.treatment <- C(OBrienKaiser$treatment, "contr.treatment")
mod.ok.contr.treatment <- lm(cbind(pre.1, pre.2, pre.3, pre.4, pre.5,
post.1, post.2, post.3, post.4, post.5,
fup.1, fup.2, fup.3, fup.4, fup.5) ~
contrasted.treatment*gender, data=OBrienKaiser)
contrasted.treatment <- C(OBrienKaiser$treatment, "contr.helmert")
mod.ok.contr.helmert <- lm(cbind(pre.1, pre.2, pre.3, pre.4, pre.5,
post.1, post.2, post.3, post.4, post.5,
fup.1, fup.2, fup.3, fup.4, fup.5) ~
contrasted.treatment*gender, data=OBrienKaiser)
contrasted.treatment <- C(OBrienKaiser$treatment, "contr.poly")
mod.ok.contr.poly <- lm(cbind(pre.1, pre.2, pre.3, pre.4, pre.5,
post.1, post.2, post.3, post.4, post.5,
fup.1, fup.2, fup.3, fup.4, fup.5) ~
contrasted.treatment*gender, data=OBrienKaiser)
contrasted.treatment <- C(OBrienKaiser$treatment, "contr.sum")
mod.ok.contr.sum <- lm(cbind(pre.1, pre.2, pre.3, pre.4, pre.5,
post.1, post.2, post.3, post.4, post.5,
fup.1, fup.2, fup.3, fup.4, fup.5) ~
contrasted.treatment*gender, data=OBrienKaiser)
# this is one result:
(Anova(mod.ok.contr.treatment, idata=idata, idesign=~phase*hour, type =
"III"))
# all of the other contrasts will now give the same outcome: (does that
mean
there shouldn't be a preference of using one over the other ?) (Anova(mod.ok.contr.helmert, idata=idata, idesign=~phase*hour, type =
"III"))
(Anova(mod.ok.contr.poly, idata=idata, idesign=~phase*hour, type = "III")) (Anova(mod.ok.contr.sum, idata=idata, idesign=~phase*hour, type = "III")) With regards, Tal On Sat, Feb 14, 2009 at 7:09 PM, John Fox <jfox at mcmaster.ca> wrote: Dear Tal,
> -----Original Message----- > From: Tal Galili [mailto:tal.galili at gmail.com] > Sent: February-14-09 10:23 AM > To: John Fox
> Cc: Peter Dalgaard; Nils Skotara; r-help at r-project.org; Michael
Friendly
> Subject: Re: [R] Anova and unbalanced designs >
> Hello John and other R mailing list members. > > I've been following your discussions regarding the Anova command
for
the SS
> type 2/3 repeated measures Anova, and I have a question: > > I found that when I go from using type II to using type III, the
summary
> model is suddenly added with an "intercept" term (example in the
end
of the
> e-mail). So my question is > 1) why is this "intercept" term added (in SS type "III" vs the
type
"II")? The computational approach taken in Anova() makes it simpler to
include
the intercept in the "type-III" tests and not to include it in the
"type-
II" tests.
> 2) Can/should this "intercept" term be removed ? (or how should it
be
> interpreted ?)
The test for the intercept is rarely of interest. A "type-II" test
for
the intercept would test that the unconditional mean of the response is
0;
a "type-III" test for the intercept would test that the constant term
in
the full model fit to the data is 0. The latter depends upon the parametrization of the model (in the case of an ANOVA model, what kind of
"contrasts"
are used). You state that the example that you give is taken from ?Anova but there's a crucial detail that's omitted: The help file only gives
the
"type-II" tests; the "type-III" tests are also reasonable here, but they depend upon having used "contr.sum" (or another set of contrasts
that's
orthogonal in the row basis of the model matrix) for the between- subject factors, treatment and gender. This detail is in the data set:
> OBrienKaiser$gender
[1] M M M F F M M F F M M M F F F F attr(,"contrasts") [1] contr.sum Levels: F M
> OBrienKaiser$treatment
[1] control control control control control A A A
A
B B [12] B B B B B attr(,"contrasts") [,1] [,2] control -2 0 A 1 -1 B 1 1 Levels: control A B With proper contrast coding, the "type-III" test for the intercept tests that the mean of the cell means (the "grand mean") is 0. Had the default dummy-coded contrasts (from contr.treatment) been
used,
the tests would not have tested reasonable hypotheses. My advice, from
the
help file: "Be very careful in formulating the model for type-III tests,
or
the hypotheses tested will not make sense." I hope this helps, John
>
> My purpose is to be able to use the Anova for analyzing an
experiment
with a
> 2 between and 3 within factors, where the between factors are not
balanced,
> and the within factors are (that is why I can't use the aov
command).
>
>
> #---code start
>
> #---code start
>
> #---code start
>
> # (taken from the ?Anova help file)
>
> phase <- factor(rep(c("pretest", "posttest", "followup"), c(5, 5,
5)),
> levels=c("pretest", "posttest", "followup"))
> hour <- ordered(rep(1:5, 3))
> idata <- data.frame(phase, hour)
> idata
> mod.ok <- lm(cbind(pre.1, pre.2, pre.3, pre.4, pre.5,
> post.1, post.2, post.3, post.4, post.5,
> fup.1, fup.2, fup.3, fup.4, fup.5) ~
treatment*gender,
> data=OBrienKaiser) > > # now we have two options > # option one is to use type II: > > (av.ok <- Anova(mod.ok, idata=idata, idesign=~phase*hour, type =
"II"))
> > > #output: > Type II Repeated Measures MANOVA Tests: Pillai test statistic > Df test stat approx F num Df den Df
Pr(>F)
> treatment 2 0.4809 4.6323 2 10
0.0376868 *
> gender 1 0.2036 2.5558 1 10
0.1409735
> treatment:gender 2 0.3635 2.8555 2 10
0.1044692
> phase 1 0.8505 25.6053 2 9
0.0001930 ***
> treatment:phase 2 0.6852 2.6056 4 20
0.0667354 .
> gender:phase 1 0.0431 0.2029 2 9
0.8199968
> treatment:gender:phase 2 0.3106 0.9193 4 20
0.4721498
> hour 1 0.9347 25.0401 4 7
0.0003043 ***
> treatment:hour 2 0.3014 0.3549 8 16
0.9295212
> gender:hour 1 0.2927 0.7243 4 7
0.6023742
> treatment:gender:hour 2 0.5702 0.7976 8 16
0.6131884
> phase:hour 1 0.5496 0.4576 8 3
0.8324517
> treatment:phase:hour 2 0.6637 0.2483 16 8
0.9914415
> gender:phase:hour 1 0.6950 0.8547 8 3
0.6202076
> treatment:gender:phase:hour 2 0.7928 0.3283 16 8
0.9723693
> --- > Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 > > > # option two is to use type III, and then get an added intercept
term:
> (av.ok <- Anova(mod.ok, idata=idata, idesign=~phase*hour, type =
"III"))
> > > # here is the output: > Type III Repeated Measures MANOVA Tests: Pillai test statistic > Df test stat approx F num Df den Df
Pr(>F)
> (Intercept) 1 0.967 296.389 1 10
9.241e-09 ***
> treatment 2 0.441 3.940 2 10
0.0547069 .
> gender 1 0.268 3.659 1 10
0.0848003 .
> treatment:gender 2 0.364 2.855 2 10
0.1044692
> phase 1 0.814 19.645 2 9
0.0005208 ***
> treatment:phase 2 0.696 2.670 4 20
0.0621085 .
> gender:phase 1 0.066 0.319 2 9
0.7349696
> treatment:gender:phase 2 0.311 0.919 4 20
0.4721498
> hour 1 0.933 24.315 4 7
0.0003345 ***
> treatment:hour 2 0.316 0.376 8 16
0.9183275
> gender:hour 1 0.339 0.898 4 7
0.5129764
> treatment:gender:hour 2 0.570 0.798 8 16
0.6131884
> phase:hour 1 0.560 0.478 8 3
0.8202673
> treatment:phase:hour 2 0.662 0.248 16 8
0.9915531
> gender:phase:hour 1 0.712 0.925 8 3
0.5894907
> treatment:gender:phase:hour 2 0.793 0.328 16 8
0.9723693
> --- > Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 > > > > #---code end > > #---code end > > #---code end > > > > Thanks in advance for your help! > Tal Galili > > > > > > > > > > > > > > > > > On Sun, Jan 25, 2009 at 3:08 AM, John Fox <jfox at mcmaster.ca>
wrote:
> > > Dear Peter and Nils, > > In my initial message, I stated misleadingly that the
contrast
coding
> didn't > matter for the "type-III" tests here since there is just one > between-subjects factor, but that's not right: The between
type-III SS
> is > correct using contr.treatment(), but the within SS is not.
As
is
> generally > the case, to get reasonable type-III tests (i.e., tests of
reasonable
> hypotheses), it's necessary to have contrasts that are
orthogonal in
> the > row-basis of the design, such as contr.sum(),
contr.helmert(),
or
> contr.poly(). The "type-II" tests, however, are insensitive
to
the
> contrast > parametrization. Anova() always uses an orthogonal
parametrization for
> the > within-subjects design. > > The general advice in ?Anova is, "Be very careful in
formulating the
> model > for type-III tests, or the hypotheses tested will not make
sense."
> > Thanks, Peter, for pointing this out. > > > John > > ------------------------------ > John Fox, Professor > Department of Sociology > McMaster University > Hamilton, Ontario, Canada > web: socserv.mcmaster.ca/jfox > >
> > -----Original Message-----
>
> > From: Peter Dalgaard [mailto:p.dalgaard at biostat.ku.dk] > > Sent: January-24-09 6:31 PM > > To: Nils Skotara > > Cc: John Fox; r-help at r-project.org; 'Michael Friendly' > > Subject: Re: [R] Anova and unbalanced designs > >
>
> > Nils Skotara wrote:
> > > Dear John, > > > > > > thank you again! You replicated the type III result I
got
in SPSS!
> When > I
> > > calculate Anova() type II: > > > > > > Univariate Type II Repeated-Measures ANOVA Assuming
Sphericity
> > > > > > SS num Df Error SS den Df F
Pr(>F)
> > > between 4.8000 1 9.0000 8 4.2667
0.07273 .
> > > within 0.2000 1 10.6667 8 0.1500
0.70864
> > > between:within 2.1333 1 10.6667 8 1.6000
0.24150
> > > --- > > > Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1
'
' 1
> > > > > > I see the exact same values as you had written. > > > However, and now I am really lost, type III (I did not
change
> anything
> > else)
> > > leads to the following: > > > > > > Univariate Type III Repeated-Measures ANOVA Assuming
Sphericity
> > > > > > SS num Df Error SS den Df
F
> Pr(>F)
> > > (Intercept) 72.000 1 9.000 8
64.0000
> 4.367e-05
> > ***
> > > between 4.800 1 9.000 8
4.2667
> 0.07273 .
> > > as.factor(within) 2.000 1 10.667 8
1.5000
> 0.25551
> > > between:as.factor(within) 2.133 1 10.667 8
1.6000
> 0.24150
> > > --- > > > Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1
'
' 1
> > > > > > How is this possible?
> > > > This looks like a contrast parametrization issue: If we
look
at the
> > per-group mean within-differences and their SE, we get > >
> > > summary(lm(within1-within2~between - 1))
> > .. > > Coefficients: > > Estimate Std. Error t value Pr(>|t|) > > between1 -1.0000 0.8165 -1.225 0.256 > > between2 0.3333 0.6667 0.500 0.631 > > ..
> > > table(between)
> > between > > 1 2 > > 4 6 > > > > Now, the type II F test is based on weighting the two
means
as you
> would
> > after testing for no interaction > >
> > > (4*-1+6*.3333)^2/(4^2*0.8165^2+6^2*0.6667^2)
> > [1] 0.1500205 > > > > and type III is to weight them as if there had been equal
counts
> >
> > > (5*-1+5*.3333)^2/(5^2*0.8165^2+5^2*0.6667^2)
> > [1] 0.400022 > > > > However, the result above corresponds to looking at group1
only
> >
> > > (-1)^2/(0.8165^2)
> > [1] 1.499987 > > > > It helps if you choose orhtogonal contrast
parametrizations:
> >
> > > options(contrasts=c("contr.sum","contr.helmert"))
> > > betweenanova <- lm(values ~ between)>
Anova(betweenanova,
> idata=with,
> > idesign= ~as.factor(within), type = "III" ) > > > > Type III Repeated Measures MANOVA Tests: Pillai test
statistic
> > Df test stat approx F num Df
den
Df
> Pr(>F)
> > (Intercept) 1 0.963 209.067 1
8
5.121e-
> 07 > ***
> > between 1 0.348 4.267 1
8
> 0.07273 .
> > as.factor(within) 1 0.048 0.400 1
8
> 0.54474
> > between:as.factor(within) 1 0.167 1.600 1
8
> 0.24150
> > --- > > Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 '
'
1
> > > > > > > > > > -- > > O__ ---- Peter Dalgaard ?ster
Farimagsgade
5, Entr.B
> > c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014
Cph. K
> > (*) \(*) -- University of Copenhagen Denmark Ph:
(+45)
> 35327918
> > ~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk) FAX:
(+45)
> 35327907 > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-
project.org/posting-
> guide.html
> and provide commented, minimal, self-contained, reproducible
code.
> > > > > > -- > ---------------------------------------------- > > > My contact information: > Tal Galili > Phone number: 972-50-3373767 > FaceBook: Tal Galili > My Blogs: > www.talgalili.com > www.biostatistics.co.il > >
-- ---------------------------------------------- My contact information: Tal Galili Phone number: 972-50-3373767 FaceBook: Tal Galili My Blogs: www.talgalili.com www.biostatistics.co.il
1 day later
Tal,
The reparametirized cell means model can help with the understanding of what the individual terms mean in an analysis with contrasts. The cell means model is y=W*mu + e, where mu is the vector of the cell means (the mean for each group) and W is just a stretched identity matrix, this model just fits the means of each cell without any comparisons/contrasts. The reparameterized cell means model is y=W*Ai * A*mu + e, where Ai and A are inverses of each other and determine the set of contrasts (I tend to think of A as the contrast matrix and Ai as the dummy variable encoding matrix, but in some cases Ai is called the contrast matrix). Basically this leads to x=W*Ai and beta= A*mu for the standard regression model of y=x*beta+e, so R is creating x for you and we just need to find A (the inverse of Ai) to see what beta really means.
We can start by creating some labels (assuming 5 groups for example) and loading the MASS package for some prettiness later:
library(MASS)
betas <- paste('b',0:4, sep='' )
mus <- paste('mu',1:5, sep='' )
Now, let's look at what Helmert contrasts give us:
Ai1 <- cbind(1, contr.helmert(5))
A1 <- solve(Ai1)
A1txt <- as.character(fractions(A1))
paste( betas, '=', apply(A1txt, 1, paste, mus, sep='*', collapse=' + '))
So beta0 (the intercept) is just the mean of the 5 groups, beta1 compares the first group to the second (actually half the first to half the second, this matters for interpreting the beta or confidence intervals, but not hypothesis tests).
Then beta2 compares the average of the first 2 groups to the 3rd group (with an extra 1/3rd in there, this makes the original Ai matrix and x matrix prettier). Beta3 and beta4 compare groups 4 and 5 to the mean of the previous ones respectively.
Now look at summation contrasts (the contrasts sum to 0)
Ai2 <- cbind(1, contr.sum(5))
A2 <- solve(Ai2)
dimnames(A2) <- list(betas,mus)
fractions(A2)
The beta0 coefficient is still the overall mean and with a little algebra it can be seen that the other rows/betas measure the difference between the cell means (except the last) and the overall mean (just replace 4/5 with 1-1/5).
Now for the non-orthogonal treatment contrasts:
Ai3 <- cbind(1, contr.treatment(5))
A3 <- solve(Ai3)
dimnames(A3) <- list(betas, mus)
fractions(A3)
Now beta0 is not a mean of all the groups, but the mean of the first (reference) group. The other betas are then the differences between the other groups and the reference group.
Polynomial contrasts are a bit more difficult to interpret:
Ai4 <- cbind(1, contr.poly(5))
A4 <- solve(Ai4)
dimnames(A4) <- list(betas, mus)
zapsmall(A4)
matplot(1:5, t(A4), type='b')
The graph is probably the easiest to interpret, the intercept is still the overall mean, beta1 shows a linear relationship, beta2 follows a quadratic, etc. These are only meaningful if the groups are ordered and the same distance apart.
We can use the same idea in reverse to create our own contrasts, suppose we want to compare group 1 (control) to the mean of the rest, then compare groups 2 and 3, compare groups 4 and 5, then compare the mean of groups 2 and 3 to the mean of groups 4 and 5, we can do either of the following (depending on what we want beta0 to mean):
A6 <- rbind(1/5,
c(-4, 1, 1, 1, 1)/4,
c( 0, -1, 1, 0, 0),
c( 0, 0, 0, -1, 1),
c( 0, -1, -1, 1, 1)/2 )
fractions(A6)
zapsmall(solve(A6))
A7 <- rbind(c(1, 0, 0, 0, 0),
c(-4, 1, 1, 1, 1)/4,
c( 0, -1, 1, 0, 0),
c( 0, 0, 0, -1, 1),
c( 0, -1, -1, 1, 1)/2 )
fractions(A7)
zapsmall(solve(A7))
Just use the above Ai matricies to create x (use the C (note capital) or contrasts functions) and the individual terms will have the desired interpretations.
Hope this helps,
--
Gregory (Greg) L. Snow Ph.D.
Statistical Data Center
Intermountain Healthcare
greg.snow at imail.org
801.408.8111
-----Original Message-----
From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-
project.org] On Behalf Of Tal Galili
Sent: Sunday, February 15, 2009 3:16 AM
To: John Fox
Cc: r-help at r-project.org; Michael Friendly; Nils Skotara; Peter
Dalgaard
Subject: Re: [R] Anova and unbalanced designs
Dear John - thank you for your detailed answer and help.
Your answer encourages me to ask further: by choosing different
contrasts,
what are the different hypothesis which are being tested? (or put
differently - should I prefer contr.sum over contr.poly or
contr.helmert,
or does this makes no difference ?)
How should this question be approached/answered ?
I see in the ?contrasts in R that the referenced reading is:
"Chambers, J. M. and Hastie, T. J. (1992) *Statistical models.* Chapter
2
of *Statistical Models in S* eds J. M. Chambers and T. J. Hastie,
Wadsworth
& Brooks/Cole."
Yet I must admit I don't have this book readily available (not on the
web,
nor in my local library), so other recommended sources would be of
great
help.
For future reference I add here a some tinkering of the code to show
how
implementing different contrasts will resort in different SS type III
analysis results:
phase <- factor(rep(c("pretest", "posttest", "followup"), c(5, 5, 5)),
levels=c("pretest", "posttest", "followup"))
hour <- ordered(rep(1:5, 3))
idata <- data.frame(phase, hour)
contrasted.treatment <- C(OBrienKaiser$treatment, "contr.treatment")
mod.ok.contr.treatment <- lm(cbind(pre.1, pre.2, pre.3, pre.4, pre.5,
post.1, post.2, post.3, post.4, post.5,
fup.1, fup.2, fup.3, fup.4, fup.5) ~
contrasted.treatment*gender, data=OBrienKaiser)
contrasted.treatment <- C(OBrienKaiser$treatment, "contr.helmert")
mod.ok.contr.helmert <- lm(cbind(pre.1, pre.2, pre.3, pre.4, pre.5,
post.1, post.2, post.3, post.4, post.5,
fup.1, fup.2, fup.3, fup.4, fup.5) ~
contrasted.treatment*gender, data=OBrienKaiser)
contrasted.treatment <- C(OBrienKaiser$treatment, "contr.poly")
mod.ok.contr.poly <- lm(cbind(pre.1, pre.2, pre.3, pre.4, pre.5,
post.1, post.2, post.3, post.4, post.5,
fup.1, fup.2, fup.3, fup.4, fup.5) ~
contrasted.treatment*gender, data=OBrienKaiser)
contrasted.treatment <- C(OBrienKaiser$treatment, "contr.sum")
mod.ok.contr.sum <- lm(cbind(pre.1, pre.2, pre.3, pre.4, pre.5,
post.1, post.2, post.3, post.4, post.5,
fup.1, fup.2, fup.3, fup.4, fup.5) ~
contrasted.treatment*gender, data=OBrienKaiser)
# this is one result:
(Anova(mod.ok.contr.treatment, idata=idata, idesign=~phase*hour, type =
"III"))
# all of the other contrasts will now give the same outcome: (does
that
mean there shouldn't be a preference of using one over the other ?)
(Anova(mod.ok.contr.helmert, idata=idata, idesign=~phase*hour, type =
"III"))
(Anova(mod.ok.contr.poly, idata=idata, idesign=~phase*hour, type =
"III"))
(Anova(mod.ok.contr.sum, idata=idata, idesign=~phase*hour, type =
"III"))
With regards,
Tal
On Sat, Feb 14, 2009 at 7:09 PM, John Fox <jfox at mcmaster.ca> wrote:
Dear Tal,
-----Original Message----- From: Tal Galili [mailto:tal.galili at gmail.com] Sent: February-14-09 10:23 AM To: John Fox Cc: Peter Dalgaard; Nils Skotara; r-help at r-project.org; Michael
Friendly
Subject: Re: [R] Anova and unbalanced designs Hello John and other R mailing list members. I've been following your discussions regarding the Anova command
for the
SS
type 2/3 repeated measures Anova, and I have a question: I found that when I go from using type II to using type III, the
summary
model is suddenly added with an "intercept" term (example in the
end of
the
e-mail). So my question is 1) why is this "intercept" term added (in SS type "III" vs the type
"II")? The computational approach taken in Anova() makes it simpler to
include the
intercept in the "type-III" tests and not to include it in the "type-
II"
tests.
2) Can/should this "intercept" term be removed ? (or how should it
be
interpreted ?)
The test for the intercept is rarely of interest. A "type-II" test
for the
intercept would test that the unconditional mean of the response is
0; a
"type-III" test for the intercept would test that the constant term
in the
full model fit to the data is 0. The latter depends upon the parametrization of the model (in the case of an ANOVA model, what kind of "contrasts"
are
used). You state that the example that you give is taken from ?Anova
but
there's a crucial detail that's omitted: The help file only gives the "type-II" tests; the "type-III" tests are also reasonable here, but
they
depend upon having used "contr.sum" (or another set of contrasts
that's
orthogonal in the row basis of the model matrix) for the between-
subject
factors, treatment and gender. This detail is in the data set:
OBrienKaiser$gender
[1] M M M F F M M F F M M M F F F F attr(,"contrasts") [1] contr.sum Levels: F M
OBrienKaiser$treatment
[1] control control control control control A A A
A
B B
[12] B B B B B
attr(,"contrasts")
[,1] [,2]
control -2 0
A 1 -1
B 1 1
Levels: control A B
With proper contrast coding, the "type-III" test for the intercept
tests
that the mean of the cell means (the "grand mean") is 0. Had the default dummy-coded contrasts (from contr.treatment) been
used, the
tests would not have tested reasonable hypotheses. My advice, from
the help
file: "Be very careful in formulating the model for type-III tests,
or the
hypotheses tested will not make sense." I hope this helps, John
My purpose is to be able to use the Anova for analyzing an
experiment
with a
2 between and 3 within factors, where the between factors are not
balanced,
and the within factors are (that is why I can't use the aov
command).
#---code start
#---code start
#---code start
# (taken from the ?Anova help file)
phase <- factor(rep(c("pretest", "posttest", "followup"), c(5, 5,
5)),
levels=c("pretest", "posttest", "followup"))
hour <- ordered(rep(1:5, 3))
idata <- data.frame(phase, hour)
idata
mod.ok <- lm(cbind(pre.1, pre.2, pre.3, pre.4, pre.5,
post.1, post.2, post.3, post.4, post.5,
fup.1, fup.2, fup.3, fup.4, fup.5) ~
treatment*gender,
data=OBrienKaiser) # now we have two options # option one is to use type II: (av.ok <- Anova(mod.ok, idata=idata, idesign=~phase*hour, type =
"II"))
#output:
Type II Repeated Measures MANOVA Tests: Pillai test statistic
Df test stat approx F num Df den Df
Pr(>F)
treatment 2 0.4809 4.6323 2 10
0.0376868
*
gender 1 0.2036 2.5558 1 10
0.1409735
treatment:gender 2 0.3635 2.8555 2 10
0.1044692
phase 1 0.8505 25.6053 2 9
0.0001930
***
treatment:phase 2 0.6852 2.6056 4 20
0.0667354
.
gender:phase 1 0.0431 0.2029 2 9
0.8199968
treatment:gender:phase 2 0.3106 0.9193 4 20
0.4721498
hour 1 0.9347 25.0401 4 7
0.0003043
***
treatment:hour 2 0.3014 0.3549 8 16
0.9295212
gender:hour 1 0.2927 0.7243 4 7
0.6023742
treatment:gender:hour 2 0.5702 0.7976 8 16
0.6131884
phase:hour 1 0.5496 0.4576 8 3
0.8324517
treatment:phase:hour 2 0.6637 0.2483 16 8
0.9914415
gender:phase:hour 1 0.6950 0.8547 8 3
0.6202076
treatment:gender:phase:hour 2 0.7928 0.3283 16 8
0.9723693
--- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 # option two is to use type III, and then get an added intercept
term:
(av.ok <- Anova(mod.ok, idata=idata, idesign=~phase*hour, type =
"III"))
# here is the output:
Type III Repeated Measures MANOVA Tests: Pillai test statistic
Df test stat approx F num Df den Df
Pr(>F)
(Intercept) 1 0.967 296.389 1 10
9.241e-09
***
treatment 2 0.441 3.940 2 10
0.0547069
.
gender 1 0.268 3.659 1 10
0.0848003
.
treatment:gender 2 0.364 2.855 2 10
0.1044692
phase 1 0.814 19.645 2 9
0.0005208
***
treatment:phase 2 0.696 2.670 4 20
0.0621085
.
gender:phase 1 0.066 0.319 2 9
0.7349696
treatment:gender:phase 2 0.311 0.919 4 20
0.4721498
hour 1 0.933 24.315 4 7
0.0003345
***
treatment:hour 2 0.316 0.376 8 16
0.9183275
gender:hour 1 0.339 0.898 4 7
0.5129764
treatment:gender:hour 2 0.570 0.798 8 16
0.6131884
phase:hour 1 0.560 0.478 8 3
0.8202673
treatment:phase:hour 2 0.662 0.248 16 8
0.9915531
gender:phase:hour 1 0.712 0.925 8 3
0.5894907
treatment:gender:phase:hour 2 0.793 0.328 16 8
0.9723693
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#---code end
#---code end
#---code end
Thanks in advance for your help!
Tal Galili
On Sun, Jan 25, 2009 at 3:08 AM, John Fox <jfox at mcmaster.ca> wrote:
Dear Peter and Nils,
In my initial message, I stated misleadingly that the
contrast
coding
didn't
matter for the "type-III" tests here since there is just one
between-subjects factor, but that's not right: The between
type-III
SS
is
correct using contr.treatment(), but the within SS is not. As
is
generally
the case, to get reasonable type-III tests (i.e., tests of
reasonable
hypotheses), it's necessary to have contrasts that are
orthogonal
in
the
row-basis of the design, such as contr.sum(),
contr.helmert(), or
contr.poly(). The "type-II" tests, however, are insensitive
to the
contrast
parametrization. Anova() always uses an orthogonal
parametrization
for
the
within-subjects design.
The general advice in ?Anova is, "Be very careful in
formulating
the
model
for type-III tests, or the hypotheses tested will not make
sense."
Thanks, Peter, for pointing this out.
John
------------------------------
John Fox, Professor
Department of Sociology
McMaster University
Hamilton, Ontario, Canada
web: socserv.mcmaster.ca/jfox
> -----Original Message-----
> From: Peter Dalgaard [mailto:p.dalgaard at biostat.ku.dk]
> Sent: January-24-09 6:31 PM
> To: Nils Skotara
> Cc: John Fox; r-help at r-project.org; 'Michael Friendly'
> Subject: Re: [R] Anova and unbalanced designs
>
> Nils Skotara wrote:
> > Dear John,
> >
> > thank you again! You replicated the type III result I got
in
SPSS!
When
I
> > calculate Anova() type II:
> >
> > Univariate Type II Repeated-Measures ANOVA Assuming
Sphericity
> >
> > SS num Df Error SS den Df F
Pr(>F)
> > between 4.8000 1 9.0000 8 4.2667
0.07273 .
> > within 0.2000 1 10.6667 8 0.1500
0.70864
> > between:within 2.1333 1 10.6667 8 1.6000
0.24150
> > ---
> > Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1
' ' 1
> >
> > I see the exact same values as you had written.
> > However, and now I am really lost, type III (I did not
change
anything
> else)
> > leads to the following:
> >
> > Univariate Type III Repeated-Measures ANOVA Assuming
Sphericity
> >
> > SS num Df Error SS den Df
F
Pr(>F)
> > (Intercept) 72.000 1 9.000 8
64.0000
4.367e-05
> ***
> > between 4.800 1 9.000 8
4.2667
0.07273 .
> > as.factor(within) 2.000 1 10.667 8
1.5000
0.25551
> > between:as.factor(within) 2.133 1 10.667 8
1.6000
0.24150
> > ---
> > Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1
' ' 1
> >
> > How is this possible?
>
> This looks like a contrast parametrization issue: If we
look at
the
> per-group mean within-differences and their SE, we get
>
> > summary(lm(within1-within2~between - 1))
> ..
> Coefficients:
> Estimate Std. Error t value Pr(>|t|)
> between1 -1.0000 0.8165 -1.225 0.256
> between2 0.3333 0.6667 0.500 0.631
> ..
> > table(between)
> between
> 1 2
> 4 6
>
> Now, the type II F test is based on weighting the two means
as
you
would
> after testing for no interaction
>
> > (4*-1+6*.3333)^2/(4^2*0.8165^2+6^2*0.6667^2)
> [1] 0.1500205
>
> and type III is to weight them as if there had been equal
counts
>
> > (5*-1+5*.3333)^2/(5^2*0.8165^2+5^2*0.6667^2)
> [1] 0.400022
>
> However, the result above corresponds to looking at group1
only
>
> > (-1)^2/(0.8165^2)
> [1] 1.499987
>
> It helps if you choose orhtogonal contrast
parametrizations:
>
> > options(contrasts=c("contr.sum","contr.helmert"))
> > betweenanova <- lm(values ~ between)>
Anova(betweenanova,
idata=with,
> idesign= ~as.factor(within), type = "III" )
>
> Type III Repeated Measures MANOVA Tests: Pillai test
statistic
> Df test stat approx F num Df den
Df
Pr(>F)
> (Intercept) 1 0.963 209.067 1
8
5.121e-
07
***
> between 1 0.348 4.267 1
8
0.07273 .
> as.factor(within) 1 0.048 0.400 1
8
0.54474
> between:as.factor(within) 1 0.167 1.600 1
8
0.24150
> ---
> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 '
' 1
>
>
>
>
> --
> O__ ---- Peter Dalgaard ?ster Farimagsgade
5,
Entr.B
> c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014
Cph. K
> (*) \(*) -- University of Copenhagen Denmark Ph:
(+45)
35327918
> ~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk) FAX:
(+45)
35327907
______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-
project.org/posting-
guide.html
and provide commented, minimal, self-contained, reproducible
code.
-- ---------------------------------------------- My contact information: Tal Galili Phone number: 972-50-3373767 FaceBook: Tal Galili My Blogs: www.talgalili.com www.biostatistics.co.il
--
----------------------------------------------
My contact information:
Tal Galili
Phone number: 972-50-3373767
FaceBook: Tal Galili
My Blogs:
www.talgalili.com
www.biostatistics.co.il
[[alternative HTML version deleted]]