Hi,
How would one go about determining if the slope terms from an analysis of
covariance model are different from eachother?
Based on the example from MASS:
library(MASS)
# parallel slope model
l.para <- lm(Temp ~ Gas + Insul, data=whiteside)
# multiple slope model
l.mult <- lm(Temp ~ Insul/Gas -1, data=whiteside)
# compare nested models:
anova(l.para, l.mult)
Analysis of Variance Table
Model 1: Temp ~ Gas + Insul
Model 2: Temp ~ Insul/Gas - 1
Res.Df RSS Df Sum of Sq F Pr(>F)
1 53 52.045
2 52 48.704 1 3.341 3.5673 0.06451 .
It seems like this approach would yield insight into whether or not the two
slope terms (InsulBefore:Gas and InsulAfter:Gas) were different. However, is
there a formal test for this sort of question, and can it be generalized to
differences between more than 2 slope terms?
Thanks,
Dylan
Generically, let y be the response, f the factor and x the covariate.
Then
pModel <- lm(y ~ f + x, data) # parallel regressions
sModel <- lm(y ~ f/x, data) # separate regressions (the '-1' is
optional)
anova(pModel, sModel) # tests whether there are differences in
slopes.
In the example you quote below (which seems oddly familiar...) there are
only two classes so you get only 1 degree of freedom. Since the tail
area is 0.064, a strict 5% straight-up-and-down frequentist would say
you retain the null hypothesis of parallel regressions (i.e. equal
slopes) at the 0.05 level, but a cautious fence sitter would keep a
watchful eye on the evidence.
The same logic applies however many classes the factor has, (or for that
matter however many covariates). You just get more degrees of freedom
for the test, and more ways things can depart from parallelism.
What's the problem?
Bill Venables
CSIRO Laboratories
PO Box 120, Cleveland, 4163
AUSTRALIA
Office Phone (email preferred): +61 7 3826 7251
Fax (if absolutely necessary): +61 7 3826 7304
Mobile: +61 4 8819 4402
Home Phone: +61 7 3286 7700
mailto:Bill.Venables at csiro.au
http://www.cmis.csiro.au/bill.venables/
-----Original Message-----
From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org]
On Behalf Of Dylan Beaudette
Sent: Thursday, 6 March 2008 9:28 AM
To: r-help at r-project.org
Subject: [R] testing for significantly different slopes
Hi,
How would one go about determining if the slope terms from an analysis
of
covariance model are different from eachother?
Based on the example from MASS:
library(MASS)
# parallel slope model
l.para <- lm(Temp ~ Gas + Insul, data=whiteside)
# multiple slope model
l.mult <- lm(Temp ~ Insul/Gas -1, data=whiteside)
# compare nested models:
anova(l.para, l.mult)
Analysis of Variance Table
Model 1: Temp ~ Gas + Insul
Model 2: Temp ~ Insul/Gas - 1
Res.Df RSS Df Sum of Sq F Pr(>F)
1 53 52.045
2 52 48.704 1 3.341 3.5673 0.06451 .
It seems like this approach would yield insight into whether or not the
two
slope terms (InsulBefore:Gas and InsulAfter:Gas) were different.
However, is
there a formal test for this sort of question, and can it be generalized
to
differences between more than 2 slope terms?
Thanks,
Dylan
On Wednesday 05 March 2008, Bill.Venables at csiro.au wrote:
Generically, let y be the response, f the factor and x the covariate.
Then
pModel <- lm(y ~ f + x, data) # parallel regressions
sModel <- lm(y ~ f/x, data) # separate regressions (the '-1' is
optional)
anova(pModel, sModel) # tests whether there are differences in
slopes.
In the example you quote below (which seems oddly familiar...) there are
only two classes so you get only 1 degree of freedom. Since the tail
area is 0.064, a strict 5% straight-up-and-down frequentist would say
you retain the null hypothesis of parallel regressions (i.e. equal
slopes) at the 0.05 level, but a cautious fence sitter would keep a
watchful eye on the evidence.
Thanks for the further clarification. I was hoping to get a response by using
that example-- perhaps I should have included a bit more (to follow).
The same logic applies however many classes the factor has, (or for that
matter however many covariates). You just get more degrees of freedom
for the test, and more ways things can depart from parallelism.
Got it.
What's the problem?
The problem is that I would like to do a pair-wise comparison between the
multiple slopes. For example with this model:
lm1 <- lm (Sepal.Length ~ Species/Sepal.Width -1, data=iris)
# truncated output from summary(lm1)
# just the slope terms
Speciessetosa:Sepal.Width 0.6905 0.1657 4.166 5.31e-05 ***
Speciesversicolor:Sepal.Width 0.8651 0.2002 4.321 2.88e-05 ***
Speciesvirginica:Sepal.Width 0.9015 0.1948 4.628 8.16e-06 ***
If I wanted to test the hypothesis that Speciessetosa:Sepal.Width was
different than Speciesversicolor:Sepal.Width, what course of action should I
take?
I have found an example in the gmodels package, using the estimable()
function, but the documentation is not clear to me. Here is the example:
library(gmodels)
# example from manual
lm1 <- lm (Sepal.Length ~ Sepal.Width + Species + Sepal.Width:Species,
data=iris)
cm <- rbind(
'Setosa vs. Versicolor' = c(0, 0, 1, 0, 1, 0),
'Setosa vs. Virginica' = c(0, 0, 0, 1, 0, 1),
'Versicolor vs. Virginica'= c(0, 0, 1,-1, 1,-1)
)
estimable(lm1, cm)
This *appears* to test what I am after, but I am not clear on how the 'cm'
argument works.
Am I barking up the wrong tree here?
Thanks,
Dylan
Bill Venables
CSIRO Laboratories
PO Box 120, Cleveland, 4163
AUSTRALIA
Office Phone (email preferred): +61 7 3826 7251
Fax (if absolutely necessary): +61 7 3826 7304
Mobile: +61 4 8819 4402
Home Phone: +61 7 3286 7700
mailto:Bill.Venables at csiro.au
http://www.cmis.csiro.au/bill.venables/
-----Original Message-----
From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org]
On Behalf Of Dylan Beaudette
Sent: Thursday, 6 March 2008 9:28 AM
To: r-help at r-project.org
Subject: [R] testing for significantly different slopes
Hi,
How would one go about determining if the slope terms from an analysis
of
covariance model are different from eachother?
Based on the example from MASS:
library(MASS)
# parallel slope model
l.para <- lm(Temp ~ Gas + Insul, data=whiteside)
# multiple slope model
l.mult <- lm(Temp ~ Insul/Gas -1, data=whiteside)
# compare nested models:
anova(l.para, l.mult)
Analysis of Variance Table
Model 1: Temp ~ Gas + Insul
Model 2: Temp ~ Insul/Gas - 1
Res.Df RSS Df Sum of Sq F Pr(>F)
1 53 52.045
2 52 48.704 1 3.341 3.5673 0.06451 .
It seems like this approach would yield insight into whether or not the
two
slope terms (InsulBefore:Gas and InsulAfter:Gas) were different.
However, is
there a formal test for this sort of question, and can it be generalized
to
differences between more than 2 slope terms?
Thanks,
Dylan
The problem is that I would like to do a pair-wise comparison
between the
multiple slopes. For example with this model:
lm1 <- lm (Sepal.Length ~ Species/Sepal.Width -1, data=iris)
# truncated output from summary(lm1)
# just the slope terms
Speciessetosa:Sepal.Width 0.6905 0.1657 4.166 5.31e-05 ***
Speciesversicolor:Sepal.Width 0.8651 0.2002 4.321 2.88e-05 ***
Speciesvirginica:Sepal.Width 0.9015 0.1948 4.628 8.16e-06 ***
If I wanted to test the hypothesis that Speciessetosa:Sepal.Width was
different than Speciesversicolor:Sepal.Width, what course of action
should I
take?
I have found an example in the gmodels package, using the estimable()
function, but the documentation is not clear to me.
The gmodels::estimable function will allow you to test any hypothesis
constructed from the model coefficients. Each row of the contrast
matrix 'cm' is applied to the vector of coefficients 'beta'
individually and tested (by default) against the null:
cm[i] %*% beta == 0
This is accomplished using t-tests using the appropriate
transformation of the model parameter covariance matrix to determine
the correct standard deviation.
Here is the example:
library(gmodels)
# example from manual
lm1 <- lm (Sepal.Length ~ Sepal.Width + Species + Sepal.Width:Species,
data=iris)
cm <- rbind(
'Setosa vs. Versicolor' = c(0, 0, 1, 0, 1, 0),
'Setosa vs. Virginica' = c(0, 0, 0, 1, 0, 1),
'Versicolor vs. Virginica'= c(0, 0, 1,-1, 1,-1)
)
estimable(lm1, cm)
This *appears* to test what I am after, but I am not clear on how
the 'cm'
argument works.
This example code tests whether the intercept *and* slope are equal
across species, with a little extra complexity because the
species=Setosa is represeted as the 'baseline' group that is included
in the intercept term.
To test whether just the slope Speciesversicolor:Sepal.Width is
equal to the slope Speciesvirginica:Sepal.Width, you can do a single
contrast row:
> estimable(lm1, c(0, 0, 0, 0, 1, -1) )
Estimate Std. Error t value DF Pr(>|t|)
(0 0 0 0 1 -1) -0.03645676 0.2793277 -0.1305161 144 0.8963403
For the one-contrast case, one can use the shortcut:
> estimable(lm1, c("Speciesversicolor:Sepal.Width"=1,
"Speciesvirginica:Sepal.Width"=-1) )
Estimate Std. Error t value DF Pr(>|t|)
(0 0 0 0 1 -1) -0.03645676 0.2793277 -0.1305161 144 0.8963403
Now, to each of the pairwise tests in a single call, construct a
matrix with one row per contrast, and one column per model parameter:
> cm <- rbind(
+ 'Setosa vs. Versicolor Slope' = c(0, 0, 0, 1, -1, 0),
+ 'Setosa vs. Virginica Slope' = c(0, 0, 0, 1, 0, -1),
+ 'Versicolor vs. Virginica Slope'= c(0, 0, 0, 0, 1, -1)
+ )
>
> colnames(cm) <- names(coef(lm1))
>
> #print(cm) # omitted here for brevity, but instructive
>
> estimable(lm1, cm)
Estimate Std. Error t value
DF Pr(>|t|)
Setosa vs. Versicolor Slope -0.17458800 0.2598919 -0.6717717 144
0.5028054
Setosa vs. Virginica Slope -0.21104476 0.2557557 -0.8251811 144
0.4106337
Versicolor vs. Virginica Slope -0.03645676 0.2793277 -0.1305161 144
0.8963403
I hope this helps.
-Greg