Dear R-list! My question is related to an Anova including within and between subject factors and unequal group sizes. Here is a minimal example of what I did: library(car) within1 <- c(1,2,3,4,5,6,4,5,3,2); within2 <- c(3,4,3,4,3,4,3,4,5,4) values <- data.frame(w1 = within1, w2 = within2) values <- as.matrix(values) between <- factor(c(rep(1,4), rep(2,6))) betweenanova <- lm(values ~ between) with <- expand.grid(within = factor(1:2)) withinanova <- Anova(betweenanova, idata=with, idesign= ~as.factor(within), type = "III" ) I do not know if this is the appropriate method to deal with unbalanced designs. I observed, that SPSS calculates everything identically except the main effect of the within factor, here, the SSQ and F-value are very different If selecting the option "show means", the means for the levels of the within factor in SPSS are the same as: mean(c(mean(values$w1[1:4]),mean(values$w1[5:10]))) and mean(c(mean(values$w2[1:4]),mean(values$w2[5:10]))). In other words, they are calculated as if both groups would have the same size. I wonder if this is a good solution and if so, how could I do the same thing in R? However, I think if this is treated in SPSS as if the group sizes are identical, then why not the interaction, which yields to the same result as using Anova()? Many thanks in advance for your time and help!
Anova and unbalanced designs
7 messages · John Fox, Skotara, Nils Skotara +1 more
Dear Nils, This is a pretty simple design, and I wouldn't have thought that there was much room for getting different results. More generally, but not here (since there's only one between-subject factor), one shouldn't use contr.treatment() with "type-III" tests, as you did. Is it possible that you got "type-II" tests from SPSS: ------ snip ----------
summary(Anova(betweenanova, idata=with, idesign= ~within, type = "II" ))
Type II Repeated Measures MANOVA Tests:
------------------------------------------
Term: between
Response transformation matrix:
(Intercept)
w1 1
w2 1
Sum of squares and products for the hypothesis:
(Intercept)
(Intercept) 9.6
Sum of squares and products for error:
(Intercept)
(Intercept) 18
Multivariate Tests: between
Df test stat approx F num Df den Df Pr(>F)
Pillai 1 0.347826 4.266667 1 8 0.072726 .
Wilks 1 0.652174 4.266667 1 8 0.072726 .
Hotelling-Lawley 1 0.533333 4.266667 1 8 0.072726 .
Roy 1 0.533333 4.266667 1 8 0.072726 .
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
------------------------------------------
Term: within
Response transformation matrix:
within1
w1 1
w2 -1
Sum of squares and products for the hypothesis:
within1
within1 0.4
Sum of squares and products for error:
within1
within1 21.33333
Multivariate Tests: within
Df test stat approx F num Df den Df Pr(>F)
Pillai 1 0.0184049 0.1500000 1 8 0.70864
Wilks 1 0.9815951 0.1500000 1 8 0.70864
Hotelling-Lawley 1 0.0187500 0.1500000 1 8 0.70864
Roy 1 0.0187500 0.1500000 1 8 0.70864
------------------------------------------
Term: between:within
Response transformation matrix:
within1
w1 1
w2 -1
Sum of squares and products for the hypothesis:
within1
within1 4.266667
Sum of squares and products for error:
within1
within1 21.33333
Multivariate Tests: between:within
Df test stat approx F num Df den Df Pr(>F)
Pillai 1 0.1666667 1.6000000 1 8 0.24150
Wilks 1 0.8333333 1.6000000 1 8 0.24150
Hotelling-Lawley 1 0.2000000 1.6000000 1 8 0.24150
Roy 1 0.2000000 1.6000000 1 8 0.24150
Univariate Type II Repeated-Measures ANOVA Assuming Sphericity
SS num Df Error SS den Df F Pr(>F)
between 4.8000 1 9.0000 8 4.2667 0.07273 .
within 0.2000 1 10.6667 8 0.1500 0.70864
between:within 2.1333 1 10.6667 8 1.6000 0.24150
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
------ snip ----------
I hope this helps,
John
------------------------------
John Fox, Professor
Department of Sociology
McMaster University
Hamilton, Ontario, Canada
web: socserv.mcmaster.ca/jfox
-----Original Message----- From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org]
On
Behalf Of Skotara Sent: January-23-09 12:16 PM To: r-help at r-project.org Subject: [R] Anova and unbalanced designs Dear R-list! My question is related to an Anova including within and between subject factors and unequal group sizes. Here is a minimal example of what I did: library(car) within1 <- c(1,2,3,4,5,6,4,5,3,2); within2 <- c(3,4,3,4,3,4,3,4,5,4) values <- data.frame(w1 = within1, w2 = within2) values <- as.matrix(values) between <- factor(c(rep(1,4), rep(2,6))) betweenanova <- lm(values ~ between) with <- expand.grid(within = factor(1:2)) withinanova <- Anova(betweenanova, idata=with, idesign= ~as.factor(within), type = "III" ) I do not know if this is the appropriate method to deal with unbalanced designs. I observed, that SPSS calculates everything identically except the main effect of the within factor, here, the SSQ and F-value are very different If selecting the option "show means", the means for the levels of the within factor in SPSS are the same as: mean(c(mean(values$w1[1:4]),mean(values$w1[5:10]))) and mean(c(mean(values$w2[1:4]),mean(values$w2[5:10]))). In other words, they are calculated as if both groups would have the same size. I wonder if this is a good solution and if so, how could I do the same thing in R? However, I think if this is treated in SPSS as if the group sizes are identical, then why not the interaction, which yields to the same result as using Anova()? Many thanks in advance for your time and help!
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Dear John, thank you for your answer. You are right, I also would not have expected a divergent result. I have double-checked it again. No, I got type-III tests. When I use type II, I get the same results in SPSS as in 'Anova' (using also type-II tests). My guess was that the somehow weighted means SPSS shows could be responsible for this difference. Or that using 'Anova' would not be correct for unequal group n's, which was not the case I think. Do you have any further ideas? Thank you! Nils John Fox schrieb:
Dear Nils, This is a pretty simple design, and I wouldn't have thought that there was much room for getting different results. More generally, but not here (since there's only one between-subject factor), one shouldn't use contr.treatment() with "type-III" tests, as you did. Is it possible that you got "type-II" tests from SPSS: ------ snip ----------
summary(Anova(betweenanova, idata=with, idesign= ~within, type = "II" ))
Type II Repeated Measures MANOVA Tests:
------------------------------------------
Term: between
Response transformation matrix:
(Intercept)
w1 1
w2 1
Sum of squares and products for the hypothesis:
(Intercept)
(Intercept) 9.6
Sum of squares and products for error:
(Intercept)
(Intercept) 18
Multivariate Tests: between
Df test stat approx F num Df den Df Pr(>F)
Pillai 1 0.347826 4.266667 1 8 0.072726 .
Wilks 1 0.652174 4.266667 1 8 0.072726 .
Hotelling-Lawley 1 0.533333 4.266667 1 8 0.072726 .
Roy 1 0.533333 4.266667 1 8 0.072726 .
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
------------------------------------------
Term: within
Response transformation matrix:
within1
w1 1
w2 -1
Sum of squares and products for the hypothesis:
within1
within1 0.4
Sum of squares and products for error:
within1
within1 21.33333
Multivariate Tests: within
Df test stat approx F num Df den Df Pr(>F)
Pillai 1 0.0184049 0.1500000 1 8 0.70864
Wilks 1 0.9815951 0.1500000 1 8 0.70864
Hotelling-Lawley 1 0.0187500 0.1500000 1 8 0.70864
Roy 1 0.0187500 0.1500000 1 8 0.70864
------------------------------------------
Term: between:within
Response transformation matrix:
within1
w1 1
w2 -1
Sum of squares and products for the hypothesis:
within1
within1 4.266667
Sum of squares and products for error:
within1
within1 21.33333
Multivariate Tests: between:within
Df test stat approx F num Df den Df Pr(>F)
Pillai 1 0.1666667 1.6000000 1 8 0.24150
Wilks 1 0.8333333 1.6000000 1 8 0.24150
Hotelling-Lawley 1 0.2000000 1.6000000 1 8 0.24150
Roy 1 0.2000000 1.6000000 1 8 0.24150
Univariate Type II Repeated-Measures ANOVA Assuming Sphericity
SS num Df Error SS den Df F Pr(>F)
between 4.8000 1 9.0000 8 4.2667 0.07273 .
within 0.2000 1 10.6667 8 0.1500 0.70864
between:within 2.1333 1 10.6667 8 1.6000 0.24150
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
------ snip ----------
I hope this helps,
John
------------------------------
John Fox, Professor
Department of Sociology
McMaster University
Hamilton, Ontario, Canada
web: socserv.mcmaster.ca/jfox
-----Original Message-----
From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org]
On
Behalf Of Skotara Sent: January-23-09 12:16 PM To: r-help at r-project.org Subject: [R] Anova and unbalanced designs Dear R-list! My question is related to an Anova including within and between subject factors and unequal group sizes. Here is a minimal example of what I did: library(car) within1 <- c(1,2,3,4,5,6,4,5,3,2); within2 <- c(3,4,3,4,3,4,3,4,5,4) values <- data.frame(w1 = within1, w2 = within2) values <- as.matrix(values) between <- factor(c(rep(1,4), rep(2,6))) betweenanova <- lm(values ~ between) with <- expand.grid(within = factor(1:2)) withinanova <- Anova(betweenanova, idata=with, idesign= ~as.factor(within), type = "III" ) I do not know if this is the appropriate method to deal with unbalanced designs. I observed, that SPSS calculates everything identically except the main effect of the within factor, here, the SSQ and F-value are very different If selecting the option "show means", the means for the levels of the within factor in SPSS are the same as: mean(c(mean(values$w1[1:4]),mean(values$w1[5:10]))) and mean(c(mean(values$w2[1:4]),mean(values$w2[5:10]))). In other words, they are calculated as if both groups would have the same size. I wonder if this is a good solution and if so, how could I do the same thing in R? However, I think if this is treated in SPSS as if the group sizes are identical, then why not the interaction, which yields to the same result as using Anova()? Many thanks in advance for your time and help!
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Dear Nils,
I don't currently have a copy of SAS on my computer, so I asked Michael
Friendly to run the problem in SAS and he kindly supplied the following
results:
----------- snip ------------
The SAS System
1
12:32 Saturday, January 24,
2009
The GLM Procedure
Class Level Information
Class Levels Values
between 2 1 2
Number of Observations Read 10
Number of Observations Used 10
The SAS System
2
12:32 Saturday, January 24,
2009
The GLM Procedure
Repeated Measures Analysis of Variance
Repeated Measures Level Information
Dependent Variable w1 w2
Level of within 1 2
MANOVA Test Criteria and Exact F Statistics
for the Hypothesis of no within Effect
H = Type III SSCP Matrix for within
E = Error SSCP Matrix
S=1 M=-0.5 N=3
Statistic Value F Value Num DF Den DF Pr
F
Wilks' Lambda 0.95238095 0.40 1 8
0.5447
Pillai's Trace 0.04761905 0.40 1 8
0.5447
Hotelling-Lawley Trace 0.05000000 0.40 1 8
0.5447
Roy's Greatest Root 0.05000000 0.40 1 8
0.5447
MANOVA Test Criteria and Exact F Statistics for
the Hypothesis of no within*between Effect
H = Type III SSCP Matrix for within*between
E = Error SSCP Matrix
S=1 M=-0.5 N=3
Statistic Value F Value Num DF Den DF Pr
F
Wilks' Lambda 0.83333333 1.60 1 8
0.2415
Pillai's Trace 0.16666667 1.60 1 8
0.2415
Hotelling-Lawley Trace 0.20000000 1.60 1 8
0.2415
Roy's Greatest Root 0.20000000 1.60 1 8
0.2415
The SAS System
3
12:32 Saturday, January 24,
2009
The GLM Procedure
Repeated Measures Analysis of Variance
Tests of Hypotheses for Between Subjects Effects
Source DF Type III SS Mean Square F Value Pr
F
between 1 4.80000000 4.80000000 4.27
0.0727
Error 8 9.00000000 1.12500000
The SAS System
4
12:32 Saturday, January 24,
2009
The GLM Procedure
Repeated Measures Analysis of Variance
Univariate Tests of Hypotheses for Within Subject Effects
Source DF Type III SS Mean Square F Value Pr
F
within 1 0.53333333 0.53333333 0.40
0.5447
within*between 1 2.13333333 2.13333333 1.60
0.2415
Error(within) 8 10.66666667 1.33333333
----------- snip ------------
As you can see, these agree with Anova():
----------- snip ------------
Type III Repeated Measures MANOVA Tests: Pillai test statistic
Df test stat approx F num Df den Df Pr(>F)
(Intercept) 1 0.963 209.067 1 8 5.121e-07 ***
between 1 0.348 4.267 1 8 0.07273 .
within 1 0.048 0.400 1 8 0.54474
between:within 1 0.167 1.600 1 8 0.24150
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Univariate Type III Repeated-Measures ANOVA Assuming Sphericity
SS num Df Error SS den Df F Pr(>F)
(Intercept) 235.200 1 9.000 8 209.0667 5.121e-07 ***
between 4.800 1 9.000 8 4.2667 0.07273 .
within 0.533 1 10.667 8 0.4000 0.54474
between:within 2.133 1 10.667 8 1.6000 0.24150
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
----------- snip ------------
So, unless Anova() and SAS are making the same error, I guess SPSS is doing
something strange (or perhaps you didn't do what you intended in SPSS). As I
said before, this problem is so simple, that I find it hard to understand
where there's room for error, but I wanted to check against SAS to test my
sanity (a procedure that will likely get a rise out of some list members).
Maybe you should send a message to the SPSS help list.
Regards,
John
------------------------------
John Fox, Professor
Department of Sociology
McMaster University
Hamilton, Ontario, Canada
web: socserv.mcmaster.ca/jfox
-----Original Message----- From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org]
On
Behalf Of Skotara Sent: January-24-09 6:30 AM To: John Fox Cc: r-help at r-project.org Subject: Re: [R] Anova and unbalanced designs Dear John, thank you for your answer. You are right, I also would not have expected a divergent result. I have double-checked it again. No, I got type-III tests. When I use type II, I get the same results in SPSS as in 'Anova' (using also type-II tests). My guess was that the somehow weighted means SPSS shows could be responsible for this difference. Or that using 'Anova' would not be correct for unequal group n's, which was not the case I think. Do you have any further ideas? Thank you! Nils John Fox schrieb:
Dear Nils, This is a pretty simple design, and I wouldn't have thought that there
was
much room for getting different results. More generally, but not here
(since
there's only one between-subject factor), one shouldn't use contr.treatment() with "type-III" tests, as you did. Is it possible that
you
got "type-II" tests from SPSS: ------ snip ----------
summary(Anova(betweenanova, idata=with, idesign= ~within, type = "II"
))
Type II Repeated Measures MANOVA Tests:
------------------------------------------
Term: between
Response transformation matrix:
(Intercept)
w1 1
w2 1
Sum of squares and products for the hypothesis:
(Intercept)
(Intercept) 9.6
Sum of squares and products for error:
(Intercept)
(Intercept) 18
Multivariate Tests: between
Df test stat approx F num Df den Df Pr(>F)
Pillai 1 0.347826 4.266667 1 8 0.072726 .
Wilks 1 0.652174 4.266667 1 8 0.072726 .
Hotelling-Lawley 1 0.533333 4.266667 1 8 0.072726 .
Roy 1 0.533333 4.266667 1 8 0.072726 .
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
------------------------------------------
Term: within
Response transformation matrix:
within1
w1 1
w2 -1
Sum of squares and products for the hypothesis:
within1
within1 0.4
Sum of squares and products for error:
within1
within1 21.33333
Multivariate Tests: within
Df test stat approx F num Df den Df Pr(>F)
Pillai 1 0.0184049 0.1500000 1 8 0.70864
Wilks 1 0.9815951 0.1500000 1 8 0.70864
Hotelling-Lawley 1 0.0187500 0.1500000 1 8 0.70864
Roy 1 0.0187500 0.1500000 1 8 0.70864
------------------------------------------
Term: between:within
Response transformation matrix:
within1
w1 1
w2 -1
Sum of squares and products for the hypothesis:
within1
within1 4.266667
Sum of squares and products for error:
within1
within1 21.33333
Multivariate Tests: between:within
Df test stat approx F num Df den Df Pr(>F)
Pillai 1 0.1666667 1.6000000 1 8 0.24150
Wilks 1 0.8333333 1.6000000 1 8 0.24150
Hotelling-Lawley 1 0.2000000 1.6000000 1 8 0.24150
Roy 1 0.2000000 1.6000000 1 8 0.24150
Univariate Type II Repeated-Measures ANOVA Assuming Sphericity
SS num Df Error SS den Df F Pr(>F)
between 4.8000 1 9.0000 8 4.2667 0.07273 .
within 0.2000 1 10.6667 8 0.1500 0.70864
between:within 2.1333 1 10.6667 8 1.6000 0.24150
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
------ snip ----------
I hope this helps,
John
------------------------------
John Fox, Professor
Department of Sociology
McMaster University
Hamilton, Ontario, Canada
web: socserv.mcmaster.ca/jfox
-----Original Message----- From: r-help-bounces at r-project.org
[mailto:r-help-bounces at r-project.org]
On
Behalf Of Skotara Sent: January-23-09 12:16 PM To: r-help at r-project.org Subject: [R] Anova and unbalanced designs Dear R-list! My question is related to an Anova including within and between subject factors and unequal group sizes. Here is a minimal example of what I did: library(car) within1 <- c(1,2,3,4,5,6,4,5,3,2); within2 <- c(3,4,3,4,3,4,3,4,5,4) values <- data.frame(w1 = within1, w2 = within2) values <- as.matrix(values) between <- factor(c(rep(1,4), rep(2,6))) betweenanova <- lm(values ~ between) with <- expand.grid(within = factor(1:2)) withinanova <- Anova(betweenanova, idata=with, idesign= ~as.factor(within), type = "III" ) I do not know if this is the appropriate method to deal with unbalanced designs. I observed, that SPSS calculates everything identically except the main effect of the within factor, here, the SSQ and F-value are very
different
If selecting the option "show means", the means for the levels of the within factor in SPSS are the same as: mean(c(mean(values$w1[1:4]),mean(values$w1[5:10]))) and mean(c(mean(values$w2[1:4]),mean(values$w2[5:10]))). In other words, they are calculated as if both groups would have the same size. I wonder if this is a good solution and if so, how could I do the same thing in R? However, I think if this is treated in SPSS as if the group sizes are identical, then why not the interaction, which yields to the same result as using Anova()? Many thanks in advance for your time and help!
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Dear John,
thank you again! You replicated the type III result I got in SPSS! When I
calculate Anova() type II:
Univariate Type II Repeated-Measures ANOVA Assuming Sphericity
SS num Df Error SS den Df F Pr(>F)
between 4.8000 1 9.0000 8 4.2667 0.07273 .
within 0.2000 1 10.6667 8 0.1500 0.70864
between:within 2.1333 1 10.6667 8 1.6000 0.24150
---
Signif. codes: 0 ?***? 0.001 ?**? 0.01 ?*? 0.05 ?.? 0.1 ? ? 1
I see the exact same values as you had written.
However, and now I am really lost, type III (I did not change anything else)
leads to the following:
Univariate Type III Repeated-Measures ANOVA Assuming Sphericity
SS num Df Error SS den Df F Pr(>F)
(Intercept) 72.000 1 9.000 8 64.0000 4.367e-05 ***
between 4.800 1 9.000 8 4.2667 0.07273 .
as.factor(within) 2.000 1 10.667 8 1.5000 0.25551
between:as.factor(within) 2.133 1 10.667 8 1.6000 0.24150
---
Signif. codes: 0 ?***? 0.001 ?**? 0.01 ?*? 0.05 ?.? 0.1 ? ? 1
How is this possible?
Best regards!
Nils
Zitat von John Fox <jfox at mcmaster.ca>:
Dear Nils,
I don't currently have a copy of SAS on my computer, so I asked Michael
Friendly to run the problem in SAS and he kindly supplied the following
results:
----------- snip ------------
The SAS System
1
12:32 Saturday, January 24,
2009
The GLM Procedure
Class Level Information
Class Levels Values
between 2 1 2
Number of Observations Read 10
Number of Observations Used 10
The SAS System
2
12:32 Saturday, January 24,
2009
The GLM Procedure
Repeated Measures Analysis of Variance
Repeated Measures Level Information
Dependent Variable w1 w2
Level of within 1 2
MANOVA Test Criteria and Exact F Statistics
for the Hypothesis of no within Effect
H = Type III SSCP Matrix for within
E = Error SSCP Matrix
S=1 M=-0.5 N=3
Statistic Value F Value Num DF Den DF Pr
F
Wilks' Lambda 0.95238095 0.40 1 8
0.5447
Pillai's Trace 0.04761905 0.40 1 8
0.5447
Hotelling-Lawley Trace 0.05000000 0.40 1 8
0.5447
Roy's Greatest Root 0.05000000 0.40 1 8
0.5447
MANOVA Test Criteria and Exact F Statistics for
the Hypothesis of no within*between Effect
H = Type III SSCP Matrix for within*between
E = Error SSCP Matrix
S=1 M=-0.5 N=3
Statistic Value F Value Num DF Den DF Pr
F
Wilks' Lambda 0.83333333 1.60 1 8
0.2415
Pillai's Trace 0.16666667 1.60 1 8
0.2415
Hotelling-Lawley Trace 0.20000000 1.60 1 8
0.2415
Roy's Greatest Root 0.20000000 1.60 1 8
0.2415
The SAS System
3
12:32 Saturday, January 24,
2009
The GLM Procedure
Repeated Measures Analysis of Variance
Tests of Hypotheses for Between Subjects Effects
Source DF Type III SS Mean Square F Value Pr
F
between 1 4.80000000 4.80000000 4.27
0.0727
Error 8 9.00000000 1.12500000
The SAS System
4
12:32 Saturday, January 24,
2009
The GLM Procedure
Repeated Measures Analysis of Variance
Univariate Tests of Hypotheses for Within Subject Effects
Source DF Type III SS Mean Square F Value Pr
F
within 1 0.53333333 0.53333333 0.40
0.5447
within*between 1 2.13333333 2.13333333 1.60
0.2415
Error(within) 8 10.66666667 1.33333333
----------- snip ------------
As you can see, these agree with Anova():
----------- snip ------------
Type III Repeated Measures MANOVA Tests: Pillai test statistic
Df test stat approx F num Df den Df Pr(>F)
(Intercept) 1 0.963 209.067 1 8 5.121e-07 ***
between 1 0.348 4.267 1 8 0.07273 .
within 1 0.048 0.400 1 8 0.54474
between:within 1 0.167 1.600 1 8 0.24150
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Univariate Type III Repeated-Measures ANOVA Assuming Sphericity
SS num Df Error SS den Df F Pr(>F)
(Intercept) 235.200 1 9.000 8 209.0667 5.121e-07 ***
between 4.800 1 9.000 8 4.2667 0.07273 .
within 0.533 1 10.667 8 0.4000 0.54474
between:within 2.133 1 10.667 8 1.6000 0.24150
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
----------- snip ------------
So, unless Anova() and SAS are making the same error, I guess SPSS is doing
something strange (or perhaps you didn't do what you intended in SPSS). As I
said before, this problem is so simple, that I find it hard to understand
where there's room for error, but I wanted to check against SAS to test my
sanity (a procedure that will likely get a rise out of some list members).
Maybe you should send a message to the SPSS help list.
Regards,
John
------------------------------
John Fox, Professor
Department of Sociology
McMaster University
Hamilton, Ontario, Canada
web: socserv.mcmaster.ca/jfox
-----Original Message----- From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org]
On
Behalf Of Skotara Sent: January-24-09 6:30 AM To: John Fox Cc: r-help at r-project.org Subject: Re: [R] Anova and unbalanced designs Dear John, thank you for your answer. You are right, I also would not have expected a divergent result. I have double-checked it again. No, I got type-III tests. When I use type II, I get the same results in SPSS as in 'Anova' (using also type-II tests). My guess was that the somehow weighted means SPSS shows could be responsible for this difference. Or that using 'Anova' would not be correct for unequal group n's, which was not the case I think. Do you have any further ideas? Thank you! Nils John Fox schrieb:
Dear Nils, This is a pretty simple design, and I wouldn't have thought that there
was
much room for getting different results. More generally, but not here
(since
there's only one between-subject factor), one shouldn't use contr.treatment() with "type-III" tests, as you did. Is it possible that
you
got "type-II" tests from SPSS: ------ snip ----------
summary(Anova(betweenanova, idata=with, idesign= ~within, type = "II"
))
Type II Repeated Measures MANOVA Tests:
------------------------------------------
Term: between
Response transformation matrix:
(Intercept)
w1 1
w2 1
Sum of squares and products for the hypothesis:
(Intercept)
(Intercept) 9.6
Sum of squares and products for error:
(Intercept)
(Intercept) 18
Multivariate Tests: between
Df test stat approx F num Df den Df Pr(>F)
Pillai 1 0.347826 4.266667 1 8 0.072726 .
Wilks 1 0.652174 4.266667 1 8 0.072726 .
Hotelling-Lawley 1 0.533333 4.266667 1 8 0.072726 .
Roy 1 0.533333 4.266667 1 8 0.072726 .
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
------------------------------------------
Term: within
Response transformation matrix:
within1
w1 1
w2 -1
Sum of squares and products for the hypothesis:
within1
within1 0.4
Sum of squares and products for error:
within1
within1 21.33333
Multivariate Tests: within
Df test stat approx F num Df den Df Pr(>F)
Pillai 1 0.0184049 0.1500000 1 8 0.70864
Wilks 1 0.9815951 0.1500000 1 8 0.70864
Hotelling-Lawley 1 0.0187500 0.1500000 1 8 0.70864
Roy 1 0.0187500 0.1500000 1 8 0.70864
------------------------------------------
Term: between:within
Response transformation matrix:
within1
w1 1
w2 -1
Sum of squares and products for the hypothesis:
within1
within1 4.266667
Sum of squares and products for error:
within1
within1 21.33333
Multivariate Tests: between:within
Df test stat approx F num Df den Df Pr(>F)
Pillai 1 0.1666667 1.6000000 1 8 0.24150
Wilks 1 0.8333333 1.6000000 1 8 0.24150
Hotelling-Lawley 1 0.2000000 1.6000000 1 8 0.24150
Roy 1 0.2000000 1.6000000 1 8 0.24150
Univariate Type II Repeated-Measures ANOVA Assuming Sphericity
SS num Df Error SS den Df F Pr(>F)
between 4.8000 1 9.0000 8 4.2667 0.07273 .
within 0.2000 1 10.6667 8 0.1500 0.70864
between:within 2.1333 1 10.6667 8 1.6000 0.24150
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
------ snip ----------
I hope this helps,
John
------------------------------
John Fox, Professor
Department of Sociology
McMaster University
Hamilton, Ontario, Canada
web: socserv.mcmaster.ca/jfox
-----Original Message----- From: r-help-bounces at r-project.org
[mailto:r-help-bounces at r-project.org]
On
Behalf Of Skotara Sent: January-23-09 12:16 PM To: r-help at r-project.org Subject: [R] Anova and unbalanced designs Dear R-list! My question is related to an Anova including within and between subject factors and unequal group sizes. Here is a minimal example of what I did: library(car) within1 <- c(1,2,3,4,5,6,4,5,3,2); within2 <- c(3,4,3,4,3,4,3,4,5,4) values <- data.frame(w1 = within1, w2 = within2) values <- as.matrix(values) between <- factor(c(rep(1,4), rep(2,6))) betweenanova <- lm(values ~ between) with <- expand.grid(within = factor(1:2)) withinanova <- Anova(betweenanova, idata=with, idesign= ~as.factor(within), type = "III" ) I do not know if this is the appropriate method to deal with unbalanced designs. I observed, that SPSS calculates everything identically except the main effect of the within factor, here, the SSQ and F-value are very
different
If selecting the option "show means", the means for the levels of the within factor in SPSS are the same as: mean(c(mean(values$w1[1:4]),mean(values$w1[5:10]))) and mean(c(mean(values$w2[1:4]),mean(values$w2[5:10]))). In other words, they are calculated as if both groups would have the same size. I wonder if this is a good solution and if so, how could I do the same thing in R? However, I think if this is treated in SPSS as if the group sizes are identical, then why not the interaction, which yields to the same result as using Anova()? Many thanks in advance for your time and help!
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Nils Skotara wrote:
Dear John,
thank you again! You replicated the type III result I got in SPSS! When I
calculate Anova() type II:
Univariate Type II Repeated-Measures ANOVA Assuming Sphericity
SS num Df Error SS den Df F Pr(>F)
between 4.8000 1 9.0000 8 4.2667 0.07273 .
within 0.2000 1 10.6667 8 0.1500 0.70864
between:within 2.1333 1 10.6667 8 1.6000 0.24150
---
Signif. codes: 0 ?***? 0.001 ?**? 0.01 ?*? 0.05 ?.? 0.1 ? ? 1
I see the exact same values as you had written.
However, and now I am really lost, type III (I did not change anything else)
leads to the following:
Univariate Type III Repeated-Measures ANOVA Assuming Sphericity
SS num Df Error SS den Df F Pr(>F)
(Intercept) 72.000 1 9.000 8 64.0000 4.367e-05 ***
between 4.800 1 9.000 8 4.2667 0.07273 .
as.factor(within) 2.000 1 10.667 8 1.5000 0.25551
between:as.factor(within) 2.133 1 10.667 8 1.6000 0.24150
---
Signif. codes: 0 ?***? 0.001 ?**? 0.01 ?*? 0.05 ?.? 0.1 ? ? 1
How is this possible?
This looks like a contrast parametrization issue: If we look at the
per-group mean within-differences and their SE, we get
> summary(lm(within1-within2~between - 1))
..
Coefficients:
Estimate Std. Error t value Pr(>|t|)
between1 -1.0000 0.8165 -1.225 0.256
between2 0.3333 0.6667 0.500 0.631
..
> table(between)
between
1 2
4 6
Now, the type II F test is based on weighting the two means as you would
after testing for no interaction
> (4*-1+6*.3333)^2/(4^2*0.8165^2+6^2*0.6667^2)
[1] 0.1500205
and type III is to weight them as if there had been equal counts
> (5*-1+5*.3333)^2/(5^2*0.8165^2+5^2*0.6667^2)
[1] 0.400022
However, the result above corresponds to looking at group1 only
> (-1)^2/(0.8165^2)
[1] 1.499987
It helps if you choose orhtogonal contrast parametrizations:
> options(contrasts=c("contr.sum","contr.helmert"))
> betweenanova <- lm(values ~ between)> Anova(betweenanova, idata=with,
idesign= ~as.factor(within), type = "III" )
Type III Repeated Measures MANOVA Tests: Pillai test statistic
Df test stat approx F num Df den Df Pr(>F)
(Intercept) 1 0.963 209.067 1 8 5.121e-07 ***
between 1 0.348 4.267 1 8 0.07273 .
as.factor(within) 1 0.048 0.400 1 8 0.54474
between:as.factor(within) 1 0.167 1.600 1 8 0.24150
---
Signif. codes: 0 ?***? 0.001 ?**? 0.01 ?*? 0.05 ?.? 0.1 ? ? 1
O__ ---- Peter Dalgaard ?ster Farimagsgade 5, Entr.B c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014 Cph. K (*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918 ~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk) FAX: (+45) 35327907
Dear Peter and Nils, In my initial message, I stated misleadingly that the contrast coding didn't matter for the "type-III" tests here since there is just one between-subjects factor, but that's not right: The between type-III SS is correct using contr.treatment(), but the within SS is not. As is generally the case, to get reasonable type-III tests (i.e., tests of reasonable hypotheses), it's necessary to have contrasts that are orthogonal in the row-basis of the design, such as contr.sum(), contr.helmert(), or contr.poly(). The "type-II" tests, however, are insensitive to the contrast parametrization. Anova() always uses an orthogonal parametrization for the within-subjects design. The general advice in ?Anova is, "Be very careful in formulating the model for type-III tests, or the hypotheses tested will not make sense." Thanks, Peter, for pointing this out. John ------------------------------ John Fox, Professor Department of Sociology McMaster University Hamilton, Ontario, Canada web: socserv.mcmaster.ca/jfox
-----Original Message----- From: Peter Dalgaard [mailto:p.dalgaard at biostat.ku.dk] Sent: January-24-09 6:31 PM To: Nils Skotara Cc: John Fox; r-help at r-project.org; 'Michael Friendly' Subject: Re: [R] Anova and unbalanced designs Nils Skotara wrote:
Dear John, thank you again! You replicated the type III result I got in SPSS! When
I
calculate Anova() type II:
Univariate Type II Repeated-Measures ANOVA Assuming Sphericity
SS num Df Error SS den Df F Pr(>F)
between 4.8000 1 9.0000 8 4.2667 0.07273 .
within 0.2000 1 10.6667 8 0.1500 0.70864
between:within 2.1333 1 10.6667 8 1.6000 0.24150
---
Signif. codes: 0 ?***? 0.001 ?**? 0.01 ?*? 0.05 ?.? 0.1 ? ? 1
I see the exact same values as you had written.
However, and now I am really lost, type III (I did not change anything
else)
leads to the following:
Univariate Type III Repeated-Measures ANOVA Assuming Sphericity
SS num Df Error SS den Df F
Pr(>F)
(Intercept) 72.000 1 9.000 8 64.0000
4.367e-05
***
between 4.800 1 9.000 8 4.2667
0.07273 .
as.factor(within) 2.000 1 10.667 8 1.5000
0.25551
between:as.factor(within) 2.133 1 10.667 8 1.6000
0.24150
--- Signif. codes: 0 ?***? 0.001 ?**? 0.01 ?*? 0.05 ?.? 0.1 ? ? 1 How is this possible?
This looks like a contrast parametrization issue: If we look at the per-group mean within-differences and their SE, we get
> summary(lm(within1-within2~between - 1))
..
Coefficients:
Estimate Std. Error t value Pr(>|t|)
between1 -1.0000 0.8165 -1.225 0.256
between2 0.3333 0.6667 0.500 0.631
..
> table(between)
between 1 2 4 6 Now, the type II F test is based on weighting the two means as you would after testing for no interaction
> (4*-1+6*.3333)^2/(4^2*0.8165^2+6^2*0.6667^2)
[1] 0.1500205 and type III is to weight them as if there had been equal counts
> (5*-1+5*.3333)^2/(5^2*0.8165^2+5^2*0.6667^2)
[1] 0.400022 However, the result above corresponds to looking at group1 only
> (-1)^2/(0.8165^2)
[1] 1.499987 It helps if you choose orhtogonal contrast parametrizations:
> options(contrasts=c("contr.sum","contr.helmert"))
> betweenanova <- lm(values ~ between)> Anova(betweenanova, idata=with,
idesign= ~as.factor(within), type = "III" )
Type III Repeated Measures MANOVA Tests: Pillai test statistic
Df test stat approx F num Df den Df Pr(>F)
(Intercept) 1 0.963 209.067 1 8 5.121e-07
***
between 1 0.348 4.267 1 8 0.07273 .
as.factor(within) 1 0.048 0.400 1 8 0.54474
between:as.factor(within) 1 0.167 1.600 1 8 0.24150
---
Signif. codes: 0 ?***? 0.001 ?**? 0.01 ?*? 0.05 ?.? 0.1 ? ? 1
--
O__ ---- Peter Dalgaard ?ster Farimagsgade 5, Entr.B
c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014 Cph. K
(*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918
~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk) FAX: (+45) 35327907