Hello, We've got a dataset with several variables, one of which we're using to split the data into 3 smaller subsets. (as the variable takes 1 of 3 possible values). There are several more variables too, many of which we're using to fit regression models using lm. So I have 3 models fitted (one for each subset of course), each having slope estimates for the predictor variables. What we want to find out, though, is whether or not the overall slopes for the 3 regression lines are significantly different from each other. Is there a way, in R, to calculate the overall slope of each line, and test whether there's homogeneity of regression slopes? (Am I using that phrase in the right context -- comparing the slopes of more than one regression line rather than the slopes of the predictors within the same fit.) I hope that makes sense. We really wanted to see if the predicted values at the ends of the 3 regression lines are significantly different... But I'm not sure how to do the Johnson-Neyman procedure in R, so I think testing for slope differences will suffice! Thanks to any who may be able to help! Doug Adams
Homogeneity of regression slopes
8 messages · Doug Adams, Michael Bedward, Clifford Long +1 more
Hello Doug, Perhaps it would just be easier to keep your data together and have a single regression with a term for the grouping variable (a factor with 3 levels). If the groups give identical results the coefficients for the two non-reference grouping variable levels will include 0 in their confidence interval. Michael
On 14 September 2010 06:52, Doug Adams <fog0 at gmx.com> wrote:
Hello, We've got a dataset with several variables, one of which we're using to split the data into 3 smaller subsets. ?(as the variable takes 1 of 3 possible values). There are several more variables too, many of which we're using to fit regression models using lm. ?So I have 3 models fitted (one for each subset of course), each having slope estimates for the predictor variables. What we want to find out, though, is whether or not the overall slopes for the 3 regression lines are significantly different from each other. ?Is there a way, in R, to calculate the overall slope of each line, and test whether there's homogeneity of regression slopes? ?(Am I using that phrase in the right context -- comparing the slopes of more than one regression line rather than the slopes of the predictors within the same fit.) I hope that makes sense. ?We really wanted to see if the predicted values at the ends of the 3 regression lines are significantly different... But I'm not sure how to do the Johnson-Neyman procedure in R, so I think testing for slope differences will suffice! Thanks to any who may be able to help! Doug Adams
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
An embedded and charset-unspecified text was scrubbed... Name: not available URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20100913/594c71b0/attachment.pl>
Thanks for turning my half-baked suggestion into something that would actually work Cliff :) Michael
On 14 September 2010 12:27, Clifford Long <gnolffilc at gmail.com> wrote:
If you'll allow me to throw in two?cents ... Like Michael said, the dummy variable route is the way to go, but I believe that the coefficients on the dummy variables test for equal intercepts.? For equality of slopes, do we need the interaction between the dummy variable and the explanatory variable whose slope (coefficient) is of interest?? I'll add some detail below. For only two groups, we could use a single?2-level dummy variable D D = 0 is the reference level (group) D = 1 is the other level (group) Equality of intercepts y = b0 + b1*x + b2*D If D = 0, then y = b0 + b1*x If D = 1, then?y = b0 + b1*x + b2?? ......?? group like terms: y = (b0 + b2) + b1*x If coefficient b2 = 0, then we might fail to reject the null hypothesis that the intercepts are equal If coefficient b2 <> 0, then we would reject the null hypothesis that the intercepts are equal Equality of slopes model y = b0 + b1*x + b2*D + b3*x*D (we added the interaction between x and D) If D = 0, then y = b0 + b1*x If D = 1, then?y = b0 + b1*x + b2?+ b3*x? ......?? group like terms: y = (b0 + b2) + (b1 + b3)*x If coefficient b3 = 0, then we might fail to reject the null hypothesis that the?slopes are equal If coefficient b3 <> 0, then we would reject the null hypothesis that the?slopes are equal For a model with three groups, assuming that lm / glm / etc. would really do this for you, the explicit dummy variable coding might look like: ???????????????? D1????? D2 group 1?????? 0???????? 0?? ?(reference level ... can usually choose) group 2?????? 1???????? 0 group 3?????? 0???????? 1 I believe that this is called a sigma-restricted model (??), as opposed to an overparameterized model where three groups would have three dummy variables. You can probably find this info in most books on basic regression.? This might be overly simplistic, and I'll happily stand corrected if I've made any mistakes. Otherwise, I hope that this helps. Cliff
An embedded and charset-unspecified text was scrubbed... Name: not available URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20100913/e8b8ca0b/attachment.pl>
An embedded and charset-unspecified text was scrubbed... Name: not available URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20100914/c37e6701/attachment.pl>
An embedded and charset-unspecified text was scrubbed... Name: not available URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20100914/30dbf066/attachment.pl>
1 day later
That's good insight, and gives me some good ideas for what direction to this. Thanks everyone ! Doug P.S. - I guess if you have a significant interaction, that implies the slopes of the individual regression lines are significantly different anyway, doesn't it...
On Tue, Sep 14, 2010 at 11:33 AM, Thomas Stewart <tgstewart at gmail.com> wrote:
If you are interested in exploring the "homogeneity of variance" assumption, I would suggest you model the variance explicitly. ?Doing so allows you to compare the?homogeneous variance model to the heterogeneous variance model within a nested model framework. ?In that framework, you'll have likelihood ratio tests, etc. This is why I suggested the nlme package and the gls function. ?The gls function allows you to model the variance. -tgs P.S. WLS is a type of GLS. P.P.S It isn't clear to me how a variance stabilizing transformation would help in this case. On Tue, Sep 14, 2010 at 6:53 AM, Clifford Long <gnolffilc at gmail.com> wrote:
Hi Thomas, Thanks for the additional information. Just wondering, and hoping to learn ... would any lack of homogeneity of variance (which is what I believe you mean by different stddev estimates) be found when performing standard regression diagnostics, such as residual plots, Levene's test (or equivalent), etc.?? If so, then would a WLS routine or some type of variance stabilizing transformation be useful? Again, hoping to learn.? I'll check out the gls() routine in the nlme package, as you mentioned. Thanks. Cliff On Mon, Sep 13, 2010 at 10:02 PM, Thomas Stewart <tgstewart at gmail.com> wrote:
Allow me to add to Michael's and Clifford's responses. If you fit the same regression model for each group, then you are also fitting a standard deviation parameter for each model. ?The solution proposed by Michael and Clifford is a good one, but the solution assumes that the standard deviation parameter is the same for all three models. You may want to consider the degree by which the standard deviation estimates differ for the three separate models. ?If they differ wildly, the method described by Michael and Clifford may not be the best. ?Rather, you may want to consider gls() in the nlme package to explicitly allow the variance parameters to vary. -tgs On Mon, Sep 13, 2010 at 4:52 PM, Doug Adams <fog0 at gmx.com> wrote:
Hello, We've got a dataset with several variables, one of which we're using to split the data into 3 smaller subsets. ?(as the variable takes 1 of 3 possible values). There are several more variables too, many of which we're using to fit regression models using lm. ?So I have 3 models fitted (one for each subset of course), each having slope estimates for the predictor variables. What we want to find out, though, is whether or not the overall slopes for the 3 regression lines are significantly different from each other. ?Is there a way, in R, to calculate the overall slope of each line, and test whether there's homogeneity of regression slopes? ?(Am I using that phrase in the right context -- comparing the slopes of more than one regression line rather than the slopes of the predictors within the same fit.) I hope that makes sense. ?We really wanted to see if the predicted values at the ends of the 3 regression lines are significantly different... But I'm not sure how to do the Johnson-Neyman procedure in R, so I think testing for slope differences will suffice! Thanks to any who may be able to help! Doug Adams
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
? ? ? ?[[alternative HTML version deleted]]
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.