An embedded and charset-unspecified text was scrubbed... Name: not available Url: https://stat.ethz.ch/pipermail/r-help/attachments/20080324/b225f701/attachment.pl
Great difference for piecewise linear function between R and SAS
8 messages · zhijie zhang, Duncan Murdoch, Paul Johnson +1 more
On 24/03/2008 5:23 AM, zhijie zhang wrote:
Dear Rusers, I am now using R and SAS to fit the piecewise linear functions, and what surprised me is that they have a great differrent result. See below.
You're using different bases. Compare the predictions, not the coefficients. To see the bases that R uses, do this: matplot(distance, bs(distance,degree=1,knots=c(16.13,24))) matplot(y, bs(y,degree=1,knots=c(-0.4357,-0.3202))) Duncan Murdoch
#R code--Knots for distance are 16.13 and 24, respectively, and Knots for y
are -0.4357 and -0.3202
m.glm<-glm(mark~x+poly(elevation,2)+bs(distance,degree=1,knots=c(16.13,24))
+bs(y,degree=1,knots=c(-0.4357,-0.3202
)),family=binomial(logit),data=point)
summary(m.glm)
Coefficients:
Estimate Std. Error z
value Pr(>|z|)
(Intercept) 12.104 3.138
3.857 0.000115 ***
x 5.815 1.987
2.926 0.003433 **
poly(elevation, 2)1 6.654 4.457
1.493 0.135444
poly(elevation, 2)2 -6.755 3.441 -
1.963 0.049645 *
bs(distance, degree = 1, knots = c(16.13, 24))1 -1.291 1.139 -
1.133 0.257220
bs(distance, degree = 1, knots = c(16.13, 24))2 -10.348 2.025 -
5.110 3.22e-07 ***
bs(distance, degree = 1, knots = c(16.13, 24))3 -3.530 3.752 -
0.941 0.346850
bs(y, degree = 1, knots = c(-0.4357, -0.3202))1 -6.828 1.836 -
3.719 0.000200 ***
bs(y, degree = 1, knots = c(-0.4357, -0.3202))2 -4.426 1.614 -
2.742 0.006105 **
bs(y, degree = 1, knots = c(-0.4357, -0.3202))3 -11.216 2.861 -
3.920 8.86e-05 ***
#SAS codes
data b;
set a;
if distance > 16.13 then d1=1; else d1=0;
distance2=d1*(distance - 16.13);
if distance > 24 then d2=1; else d2=0;
distance3=d2*(distance - 24);
if y>-0.4357 then dd1=1; else dd1=0;
y2=dd1*(y+0.4357);
if y>-0.3202 then dd2=1; else dd2=0;
y3=dd2*(y+0.3202);
run;
proc logistic descending data=b;
model mark =x elevation elevation*elevation distance distance2 distance3 y
y2 y3;
run;
The LOGISTIC Procedure Analysis of Maximum Likelihood
Estimates
Standard Wald
Parameter DF Estimate Error
Chi-Square Pr > ChiSq
Intercept 1 -2.6148 2.1445
1.4867
0.2227
x 1 5.8146 1.9872
8.5615
0.0034
elevation 1 0.4457 0.1506
8.7545
0.0031
elevation*elevation 1 -0.0279 0.0142
3.8533
0.0496
distance 1 -0.1091 0.0963
1.2836
0.2572
distance2 1 -1.0418 0.2668
15.2424
<.0001
distance3 1 2.8633 0.7555
14.3625
0.0002
y 1 -16.2032 4.3568
13.8314
0.0002
y2 1 36.9974 10.3219
12.8476
0.0003
y3 1 -58.4938 14.0279
17.3875
<.0001
Q: What is the problem? which one is correct for the piecewise linear
function?
Thanks very much.
An embedded and charset-unspecified text was scrubbed... Name: not available Url: https://stat.ethz.ch/pipermail/r-help/attachments/20080324/ed50f2d8/attachment.pl
On 24/03/2008 7:06 AM, zhijie zhang wrote:
Dear Murdoch, "Compare the predictions, not the coefficients.", this is the key point. I have checked their predicted probability and their results are the same. What do you mean by "You're using different bases"? Could you please give me a little more info on it, so that i can go to find some materials?
There are many ways to represent a piecewise linear function. R and your SAS code have both chosen linear combinations of basis functions, but you have used different basis functions. R uses the B-spline basis. You have what is sometimes called the truncated power basis, but maybe not commonly in the linear context. I don't know if it has a name here. Duncan Murdoch
Thanks a lot. On 3/24/08, Duncan Murdoch <murdoch at stats.uwo.ca> wrote:
On 24/03/2008 5:23 AM, zhijie zhang wrote:
Dear Rusers, I am now using R and SAS to fit the piecewise linear functions, and
what
surprised me is that they have a great differrent result. See below.
You're using different bases. Compare the predictions, not the coefficients. To see the bases that R uses, do this: matplot(distance, bs(distance,degree=1,knots=c(16.13,24))) matplot(y, bs(y,degree=1,knots=c(-0.4357,-0.3202))) Duncan Murdoch
#R code--Knots for distance are 16.13 and 24, respectively, and Knots
for y
are -0.4357 and -0.3202 m.glm<-glm(mark~x+poly(elevation,2)+bs(distance,degree=1,knots=c(16.13
,24))
+bs(y,degree=1,knots=c(-0.4357,-0.3202
)),family=binomial(logit),data=point)
summary(m.glm)
Coefficients:
Estimate Std.
Error z
value Pr(>|z|) (Intercept) 12.104
3.138
3.857 0.000115 *** x 5.815 1.987 2.926 0.003433 ** poly(elevation, 2)1 6.654 4.457 1.493 0.135444 poly(elevation, 2)2 -6.755
3.441 -
1.963 0.049645 * bs(distance, degree = 1, knots = c(16.13, 24))1 -1.291
1.139 -
1.133 0.257220 bs(distance, degree = 1, knots = c(16.13, 24))2 -10.348
2.025 -
5.110 3.22e-07 *** bs(distance, degree = 1, knots = c(16.13, 24))3 -3.530 3.752
-
0.941 0.346850 bs(y, degree = 1, knots = c(-0.4357, -0.3202))1 -6.828 1.836
-
3.719 0.000200 *** bs(y, degree = 1, knots = c(-0.4357, -0.3202))2 -4.426 1.614
-
2.742 0.006105 ** bs(y, degree = 1, knots = c(-0.4357, -0.3202))3 -11.216
2.861 -
3.920 8.86e-05 *** #SAS codes data b; set a; if distance > 16.13 then d1=1; else d1=0; distance2=d1*(distance - 16.13); if distance > 24 then d2=1; else d2=0; distance3=d2*(distance - 24); if y>-0.4357 then dd1=1; else dd1=0; y2=dd1*(y+0.4357); if y>-0.3202 then dd2=1; else dd2=0; y3=dd2*(y+0.3202); run; proc logistic descending data=b; model mark =x elevation elevation*elevation distance distance2
distance3 y
y2 y3;
run;
The LOGISTIC Procedure Analysis of Maximum Likelihood
Estimates
Standard Wald
Parameter DF Estimate Error
Chi-Square Pr > ChiSq
Intercept 1 -2.6148 2.1445
1.4867
0.2227
x 1 5.8146 1.9872
8.5615
0.0034
elevation 1 0.4457 0.1506
8.7545
0.0031
elevation*elevation 1 -0.0279 0.0142
3.8533
0.0496
distance 1 -0.1091 0.0963
1.2836
0.2572
distance2 1 -1.0418 0.2668
15.2424
<.0001
distance3 1 2.8633 0.7555
14.3625
0.0002
y 1 -16.2032 4.3568
13.8314
0.0002
y2 1 36.9974 10.3219
12.8476
0.0003
y3 1 -58.4938 14.0279
17.3875
<.0001
Q: What is the problem? which one is correct for the piecewise linear
function?
Thanks very much.
An embedded and charset-unspecified text was scrubbed... Name: not available Url: https://stat.ethz.ch/pipermail/r-help/attachments/20080324/6d533863/attachment.pl
On 3/24/2008 9:03 AM, zhijie zhang wrote:
Dear Murdoch,
Thanks very much for your rapid response. It helps me greatly.
If i want to write the model formula according to the estimets from
R, Is it correct for the below formula? I'm not very sure about it, and
i also hope that you can recommend a good book on this area. I want to
learn it systematically. Thanks a million.
Logit/p/=12.10+5.815*x+6.654*elevation-6.755*elevation^2
-1.291*distance_1 -10.348*distance2-3.53*distance3
_ -6.828*_y1 _ -4.426*_y2 _ -11.216*_y3 _
I would use predict() instead. What you have there doesn't look as though it uses the B-spline basis. The reference given in the ?bs help page is a reasonable starting point, but just about any book that covers splines should handle the B-spline basis and the linear case. Duncan Murdoch
#R code and the estimate results--Knots for distance are 16.13 and 24,
respectively, and Knots for y are -0.4357 and -0.3202
m.glm<-glm(mark~x+poly(elevation,2)+bs(distance,degree=1,knots=c(16.13,24))
+bs(y,degree=1,knots=c(-0.4357,-0.3202)),family=binomial(logit),data=point)
summary(m.glm)
Estimate Std.
Error z value Pr(>|z|)
(Intercept) 12.104
3.138 3.857 0.000115 ***
x 5.815
1.987 2.926 0.003433 **
poly(elevation, 2)1 6.654
4.457 1.493 0.135444
poly(elevation, 2)2 -6.755
3.441 -1.963 0.049645 *
bs(distance, degree = 1, knots = c(16.13, 24))1 -1.291
1.139 -1.133 0.257220
bs(distance, degree = 1, knots = c(16.13, 24))2 -10.348
2.025 -5.110 3.22e-07 ***
bs(distance, degree = 1, knots = c(16.13, 24))3 -3.530
3.752 -0.941 0.346850
bs(y, degree = 1, knots = c(-0.4357, -0.3202))1 -6.828
1.836 -3.719 0.000200 ***
bs(y, degree = 1, knots = c(-0.4357, -0.3202))2 -4.426
1.614 -2.742 0.006105 **
bs(y, degree = 1, knots = c(-0.4357, -0.3202))3 -11.216
2.861 -3.920 8.86e-05 ***
Thanks again.
On Mon, Mar 24, 2008 at 8:08 PM, Duncan Murdoch <murdoch at stats.uwo.ca
<mailto:murdoch at stats.uwo.ca>> wrote:
On 24/03/2008 7:06 AM, zhijie zhang wrote:
> Dear Murdoch,
> "Compare the predictions, not the coefficients.", this is the
key point.
> I have checked their predicted probability and their results
are the
> same.
> What do you mean by "You're using different bases"? Could you
please give
> me a little more info on it, so that i can go to find some materials?
There are many ways to represent a piecewise linear function. R and
your SAS code have both chosen linear combinations of basis functions,
but you have used different basis functions. R uses the B-spline basis.
You have what is sometimes called the truncated power basis, but maybe
not commonly in the linear context. I don't know if it has a name here.
Duncan Murdoch
> Thanks a lot.
>
>
> On 3/24/08, Duncan Murdoch <murdoch at stats.uwo.ca
<mailto:murdoch at stats.uwo.ca>> wrote:
>> On 24/03/2008 5:23 AM, zhijie zhang wrote:
>>> Dear Rusers,
>>> I am now using R and SAS to fit the piecewise linear
functions, and
>> what
>>> surprised me is that they have a great differrent result. See
below.
>> You're using different bases. Compare the predictions, not the
>> coefficients.
>>
>> To see the bases that R uses, do this:
>>
>> matplot(distance, bs(distance,degree=1,knots=c(16.13,24)))
>> matplot(y, bs(y,degree=1,knots=c(-0.4357,-0.3202)))
>>
>> Duncan Murdoch
>>
>>> #R code--Knots for distance are 16.13 and 24, respectively, and
Knots
>> for y
>>> are -0.4357 and -0.3202
>>>
m.glm<-glm(mark~x+poly(elevation,2)+bs(distance,degree=1,knots=c(16.13
>> ,24))
>>> +bs(y,degree=1,knots=c(-0.4357,-0.3202
>>> )),family=binomial(logit),data=point)
>>> summary(m.glm)
>>>
>>> Coefficients:
>>> Estimate Std.
>> Error z
>>> value Pr(>|z|)
>>> (Intercept) 12.104
>> 3.138
>>> 3.857 0.000115 ***
>>> x 5.815
1.987
>>> 2.926 0.003433 **
>>> poly(elevation, 2)1 6.654
4.457
>>> 1.493 0.135444
>>> poly(elevation, 2)2 -6.755
>> 3.441 -
>>> 1.963 0.049645 *
>>> bs(distance, degree = 1, knots = c(16.13, 24))1 -1.291
>> 1.139 -
>>> 1.133 0.257220
>>> bs(distance, degree = 1, knots = c(16.13, 24))2 -10.348
>> 2.025 -
>>> 5.110 3.22e-07 ***
>>> bs(distance, degree = 1, knots = c(16.13, 24))3 -3.530
3.752
>> -
>>> 0.941 0.346850
>>> bs(y, degree = 1, knots = c(-0.4357, -0.3202))1 -6.828
1.836
>> -
>>> 3.719 0.000200 ***
>>> bs(y, degree = 1, knots = c(-0.4357, -0.3202))2 -4.426
1.614
>> -
>>> 2.742 0.006105 **
>>> bs(y, degree = 1, knots = c(-0.4357, -0.3202))3 -11.216
>> 2.861 -
>>> 3.920 8.86e-05 ***
>>>
>>> #SAS codes
>>> data b;
>>> set a;
>>> if distance > 16.13 then d1=1; else d1=0;
>>> distance2=d1*(distance - 16.13);
>>> if distance > 24 then d2=1; else d2=0;
>>> distance3=d2*(distance - 24);
>>> if y>-0.4357 then dd1=1; else dd1=0;
>>> y2=dd1*(y+0.4357);
>>> if y>-0.3202 then dd2=1; else dd2=0;
>>> y3=dd2*(y+0.3202);
>>> run;
>>>
>>> proc logistic descending data=b;
>>> model mark =x elevation elevation*elevation distance distance2
>> distance3 y
>>> y2 y3;
>>> run;
>>>
>>>
>>> The LOGISTIC Procedure Analysis of Maximum
Likelihood
>>> Estimates
>>>
>>> Standard
Wald
>>> Parameter DF Estimate Error
>>> Chi-Square Pr > ChiSq
>>>
>>> Intercept 1 -2.6148 2.1445
>>> 1.4867
>>> 0.2227
>>> x 1 5.8146 1.9872
>>> 8.5615
>>> 0.0034
>>> elevation 1 0.4457 0.1506
>>> 8.7545
>>> 0.0031
>>> elevation*elevation 1 -0.0279 0.0142
>>> 3.8533
>>> 0.0496
>>> distance 1 -0.1091 0.0963
>>> 1.2836
>>> 0.2572
>>> distance2 1 -1.0418 0.2668
>>> 15.2424
>>> <.0001
>>> distance3 1 2.8633 0.7555
>>> 14.3625
>>> 0.0002
>>> y 1 -16.2032 4.3568
>>> 13.8314
>>> 0.0002
>>> y2 1 36.9974 10.3219
>>> 12.8476
>>> 0.0003
>>> y3 1 -58.4938 14.0279
>>> 17.3875
>>> <.0001
>>> Q: What is the problem? which one is correct for the piecewise
linear
>>> function?
>>> Thanks very much.
>>>
>>>
>>
>
>
-- With Kind Regards, oooO::::::::: (..)::::::::: :\.(:::Oooo:: ::\_)::(..):: :::::::)./::: ::::::(_/:::: ::::::::::::: [***********************************************************************] Zhi Jie,Zhang ,PHD Tel:+86-21-54237149 Dept. of Epidemiology,School of Public Health,Fudan University Address:No. 138 Yi Xue Yuan Road,Shanghai,China Postcode:200032 Email:epistat at gmail.com <mailto:Email:epistat at gmail.com> Website: www.statABC.com <http://www.statABC.com> [***********************************************************************] oooO::::::::: (..)::::::::: :\.(:::Oooo:: ::\_)::(..):: :::::::)./::: ::::::(_/:::: :::::::::::::
On Mon, Mar 24, 2008 at 8:11 AM, Duncan Murdoch <murdoch at stats.uwo.ca> wrote:
On 3/24/2008 9:03 AM, zhijie zhang wrote:
> Dear Murdoch,
Thanks very
I would use predict() instead. What you have there doesn't look as though it uses the B-spline basis. The reference given in the ?bs help page is a reasonable starting point, but just about any book that covers splines should handle the B-spline basis and the linear case. Duncan Murdoch
Dear Duncan and others: Can you please refer us to an understandable book that explains about b-splines? I've tried a few and the math pretty quickly becomes unintelligible to a non-math major. And if this book explains the "orthogonal polynomials" that are used to represent ordinal factors in R, it would be even better. Pointing attention back to this poster's original question, I would say that using bs() here is like hitting a mosquito with a 1000 pound bomb. There are easier ways to get the estimates for the slope and intercept shifts than with bs. I'd be tempted to take the simplest approach and create a factor to represent the levels of distance, as in myDistance <- ifelse( distance > 24, "hi", ifelse(distance > 16.13, "med", "low")) ### Coerce that to a factor myDistance <- factor(myDistance) That creates a factor with three levels, and in the regression glm (mark~x+poly(elevation,2)+ distance*myDistance ...) will give the change values for the intercepts and slopes for the segments. If you want the coefficients of the 3 separate lines, run it like this instead: glm (mark~-1 + x+poly(elevation,2)+ myDistance / distance ...) Also, I'd mention that the package "segmented" is available, but it has the added feature that it will try to estimate the breakpoints from the data. PJ
Paul E. Johnson Professor, Political Science 1541 Lilac Lane, Room 504 University of Kansas
On Mon, Mar 24, 2008 at 01:09:35PM -0500, Paul Johnson wrote:
On Mon, Mar 24, 2008 at 8:11 AM, Duncan Murdoch <murdoch at stats.uwo.ca> wrote:
On 3/24/2008 9:03 AM, zhijie zhang wrote:
> Dear Murdoch,
Thanks very
I would use predict() instead. What you have there doesn't look as though it uses the B-spline basis. The reference given in the ?bs help page is a reasonable starting point, but just about any book that covers splines should handle the B-spline basis and the linear case. Duncan Murdoch
Dear Duncan and others: Can you please refer us to an understandable book that explains about b-splines? I've tried a few and the math pretty quickly becomes unintelligible to a non-math major.
I think you should try Simon Wood's "Generalized Additive Models: An introduction in R." 2006 Chapman and Hall. I reviewed it recently and I think it's really very good. Cheers Andrew
Andrew Robinson Department of Mathematics and Statistics Tel: +61-3-8344-6410 University of Melbourne, VIC 3010 Australia Fax: +61-3-8344-4599 http://www.ms.unimelb.edu.au/~andrewpr http://blogs.mbs.edu/fishing-in-the-bay/