As Petr Pikal mentioned, the difficulty in interpretation is entirely due
to the set of contrasts you chose.The default treatment contrasts are
not orthogonal and are therefore the most difficult to interpret.
The note in ?aov warns of this difficulty.
sum contrasts will give you numbers that are easiest to interpret.
options(contrasts = c("contr.sum", "contr.poly"))
warpbreakssum.aov <- aov(breaks ~ wool * tension, data = warpbreaks)
coef(warpbreakssum.aov)
model.tables(warpbreakstreatment.aov, type="effects")
model.tables(warpbreakstreatment.aov, type="means")
John Fox showed the algebra using the default treatment contrasts
For full understanding you will need to read in a text more about
sets of linear contrasts and their algebra.
I recommend Section 10.3 in mine, of course.
Statistical Analysis and Data Display:
An Intermediate Course with Examples in R
Heiberger, Richard M., Holland, Burt
http://www.springer.com/us/book/9781493921218
On Sat, Dec 3, 2016 at 11:46 PM, Ashim Kapoor <ashimkapoor at gmail.com>
wrote:
On Sun, Dec 4, 2016 at 10:03 AM, Ashim Kapoor <ashimkapoor at gmail.com>
Dear Sir,
Many thanks for the explanation. Prior to your email (with some help
a friend of mine) I was able to figure this one out. If we look at the
model : -
y = intercept + B1.woolB + B2. tensionM + B3.tensionH + B4.
+ B5.woolB.TensionH + error
Here woolB, tensionM, tensionH are the dummy indicator variables similar
to how you have defined them.
Now suppose we consider y1,..,yn, all in group A.L (say).
Then y1 + ... + yn = intercept => average(y1,...,yn) = intercept + 0 +
0 + 0 + 0.
This should be : y1 + ... yn = n . intercept
What was confusing me was how to compute the cell mean in woolB,tensionH
cell.
If we have y_1,...,y_n all in group B.H then :-
y_1+ ... + y_n = intercept + B1 + 0 + B3 + 0 + B5
This should be : y_1 + ... +y_n = n( intercept + B1 + 0 + B3 + 0 + B5 )
Therefore average of group B.H = intercept + B1 + B3 + B5
Many thanks and Best Regards,
Ashim
On Sat, Dec 3, 2016 at 7:15 PM, Fox, John <jfox at mcmaster.ca> wrote:
Dear Ashim,
Sorry to chime in late, and my apologies if someone has already pointed
this out, but here's the relationship between the cell means and the
coefficients, using the row-basis of the model matrix:
-------------------------- snip ------------------------
means <- with( warpbreaks, tapply( breaks, interaction(wool,
x.A <- rep(c(0, 1), 3)
x.B1 <- rep(c(0, 1, 0), each=2)
x.B2 <- rep(c(0, 0, 1), each=2)
x.AB1 <- x.A*x.B1
x.AB2 <- x.A*x.B2
X.basis <- cbind(1, x.A, x.B1, x.B2, x.AB1, x.AB2)
X.basis
x.A x.B1 x.B2 x.AB1 x.AB2
[1,] 1 0 0 0 0 0
[2,] 1 1 0 0 0 0
[3,] 1 0 1 0 0 0
[4,] 1 1 1 0 1 0
[5,] 1 0 0 1 0 0
[6,] 1 1 0 1 0 1
x.A x.B1 x.B2 x.AB1 x.AB2
44.55556 -16.33333 -20.55556 -20.00000 21.11111 10.55556
coef(aov(breaks ~ wool * tension, data = warpbreaks))
(Intercept) woolB tensionM tensionH
44.55556 -16.33333 -20.55556 -20.00000
woolB:tensionH
10.55556
-------------------------- snip ------------------------
I hope this helps,
John
-----------------------------
John Fox, Professor
McMaster University
Hamilton, Ontario
Canada L8S 4M4
Web: socserv.mcmaster.ca/jfox
-----Original Message-----
From: R-help [mailto:r-help-bounces at r-project.org] On Behalf Of
Sent: December 3, 2016 12:19 AM
To: David Winsemius <dwinsemius at comcast.net>
Cc: r-help at r-project.org
Subject: Re: [R] Interpreting summary.lm for a 2 factor anova
Please allow me to rephrase myquery.
Tables of means
Grand mean
28.14815
wool
wool
A B
31.037 25.259
tension
tension
L M H
36.39 26.39 21.67
wool:tension
tension
wool L M H
A 44.56 24.00 24.56
B 28.22 28.78 18.78
The above is the same as :
with( warpbreaks, tapply( breaks, interaction(wool, tension), mean )
A.L B.L A.M B.M A.H B.H
44.55556 28.22222 24.00000 28.77778 24.55556 18.77778
For reference:
model <- aov(breaks ~ wool * tension, data = warpbreaks)
summary.lm(model)
Call:
aov(formula = breaks ~ wool * tension, data = warpbreaks)
Residuals:
Min 1Q Median 3Q Max
-19.5556 -6.8889 -0.6667 7.1944 25.4444
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 44.556 3.647 12.218 2.43e-16 ***
woolB -16.333 5.157 -3.167 0.002677 **
tensionM -20.556 5.157 -3.986 0.000228 ***
tensionH -20.000 5.157 -3.878 0.000320 ***
woolB:tensionM 21.111 7.294 2.895 0.005698 **
woolB:tensionH 10.556 7.294 1.447 0.154327
---
Signif. codes: 0 ?***? 0.001 ?**? 0.01 ?*? 0.05 ?.? 0.1 ? ? 1
Residual standard error: 10.94 on 48 degrees of freedom
Multiple R-squared: 0.3778, Adjusted R-squared: 0.3129
F-statistic: 5.828 on 5 and 48 DF, p-value: 0.0002772
Now I'll explain what is confusing me in the output of summary.lm.
Coeff of Intercept = 44.556 = cell mean for A.L. This is the base.
Coeff of woolB:L = -16.333 = 28.22222 - 44.556. This is the
cell mean(B:L) from the base.
Coeff of woolA:tensionM = -20.556 = 24.000- 44.556. This is the
this cell mean (A:M) from the base.
Coeff of woolA:tensionH = -20.000 = 24.55556 - 44.556. This is the
of this cell mean(A:H) from the base.
This is where it stops being the difference from the base.
Coeff of woolB:tensionM = 21.111 should turn out to be 28.77778 -
this is -15.77822
Coeff of woolB:tensionH = 10.556 should turn out to be 18.77778 -
this is -25.77822
In the above 2 cases, we can't say that the coefficient = cell mean -
Can you tell me what should be the statement to be made ?
Best Regards,
Ashim
PS : My apologies for emailing my query to this list. Can you tell me
of a few (active) statistics help list ?
On Sat, Dec 3, 2016 at 1:33 AM, David Winsemius <
dwinsemius at comcast.net
On Dec 2, 2016, at 9:09 AM, David Winsemius <
dwinsemius at comcast.net
On Dec 2, 2016, at 6:16 AM, Ashim Kapoor <ashimkapoor at gmail.com
Dear Pikal,
All levels except the interactions are compared to the
I'm a little confused as to what's going on in interaction terms
eg. the cell wool B : tension M. It's mean is :
28.78 and 28.78 - 44.56 = -15.78 != 21.111.
It's something like 44.56 (intercept) -16.333 (wool B) -.20.556
(tension
M) + 21.111 (woolB:tensionM) = 28.782.
I don't know how to sum up the above line in terms of
The aov estimate will not exactly equal the observed mean (this
_statistics_ after all). You should be comparing the mean of that
44.556 + (-16.33) +(-20.556) + (21.11)
A respected participant advised me to look at this more closely. In
this case (and I think in most such cases) where there are the
number of parameters as there are means, the model is "saturated"
there is no
difference:
with( warpbreaks, tapply( breaks, interaction(wool, tension),
A.L B.L A.M B.M A.H B.H
44.55556 28.22222 24.00000 28.77778 24.55556 18.77778
So the B:M estimate is identical up to rounding with the observed
44.556 + (-16.33) +(-20.556) + (21.11) [1] 28.78
The difference between the observed mean and the estimated mean
as a 'residual'
I've also been privately but gently chided for this misstatement.
Residuals are the difference between data and estimates.
and the squared sum of the all residuals is what this being
... over all the cells including the one implicitly associated with
This isn't really on-topic for Rhelp since you are not having
in getting the R program to perform its duties, but are rather in
statistical education. That not what this mailing list is set up
-----Original Message-----
From: R-help [mailto:r-help-bounces at r-project.org] On Behalf
Kapoor
Sent: Thursday, December 1, 2016 2:48 PM
To: r-help at r-project.org
Subject: [R] Interpreting summary.lm for a 2 factor anova
Dear all,
Here is a small example : -
model <- aov(breaks ~ wool * tension, data = warpbreaks)
summary.lm(model)
Call:
aov(formula = breaks ~ wool * tension, data = warpbreaks)
Residuals:
Min 1Q Median 3Q Max
-19.5556 -6.8889 -0.6667 7.1944 25.4444
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 44.556 3.647 12.218 2.43e-16 ***
woolB -16.333 5.157 -3.167 0.002677 **
tensionM -20.556 5.157 -3.986 0.000228 ***
tensionH -20.000 5.157 -3.878 0.000320 ***
woolB:tensionM 21.111 7.294 2.895 0.005698 **
woolB:tensionH 10.556 7.294 1.447 0.154327
---
Signif. codes: 0 ?***? 0.001 ?**? 0.01 ?*? 0.05 ?.? 0.1 ? ? 1
Residual standard error: 10.94 on 48 degrees of freedom
Multiple R-squared: 0.3778, Adjusted R-squared: 0.3129
F-statistic: 5.828 on 5 and 48 DF, p-value: 0.0002772
Tables of effects
wool
wool
A B
2.8889 -2.8889
tension
tension
L M H
8.241 -1.759 -6.481
wool:tension
tension
wool L M H
A 5.278 -5.278 0.000
B -5.278 5.278 0.000
Tables of means
Grand mean
28.14815
wool
wool
A B
31.037 25.259
tension
tension
L M H
36.39 26.39 21.67
wool:tension
tension
wool L M H
A 44.56 24.00 24.56
B 28.22 28.78 18.78
I don't follow the output of summary.lm. I understand the
model.tables for effects and means. For instance what does
represent ? Is it the grand average ? The grand mean is
someone help me understand the output of summary.lm ?
Best Regards,
Ashim
[[alternative HTML version deleted]]