Skip to content

Great difference for piecewise linear function between R and SAS

8 messages · zhijie zhang, Duncan Murdoch, Paul Johnson +1 more

#
On 24/03/2008 5:23 AM, zhijie zhang wrote:
You're using different bases.  Compare the predictions, not the 
coefficients.

To see the bases that R uses, do this:

matplot(distance, bs(distance,degree=1,knots=c(16.13,24)))
matplot(y, bs(y,degree=1,knots=c(-0.4357,-0.3202)))

Duncan Murdoch
#
On 24/03/2008 7:06 AM, zhijie zhang wrote:
There are many ways to represent a piecewise linear function.  R and 
your SAS code have both chosen linear combinations of basis functions, 
but you have used different basis functions.  R uses the B-spline basis. 
You have what is sometimes called the truncated power basis, but maybe 
not commonly in the linear context.  I don't know if it has a name here.

Duncan Murdoch
#
On 3/24/2008 9:03 AM, zhijie zhang wrote:
I would use predict() instead.  What you have there doesn't look as
though it uses the B-spline basis.

The reference given in the ?bs help page is a reasonable starting point,
but just about any book that covers splines should handle the B-spline
basis and the linear case.

Duncan Murdoch
#
On Mon, Mar 24, 2008 at 8:11 AM, Duncan Murdoch <murdoch at stats.uwo.ca> wrote:
Dear Duncan and others:

Can you please refer us to an understandable book that explains about
b-splines?  I've tried a few and the math pretty quickly becomes
unintelligible to a non-math major.

And if this book explains the "orthogonal polynomials" that are used
to represent ordinal factors in R, it would be even better.


Pointing attention back to this poster's original question, I would
say that using bs() here is like hitting a mosquito with a 1000 pound
bomb.  There are easier ways to get the estimates for the slope and
intercept shifts than with bs.

I'd be tempted to take the simplest approach and create a factor to
represent the levels of distance, as in

myDistance <- ifelse( distance > 24, "hi", ifelse(distance > 16.13,
"med", "low"))

### Coerce that to a factor

myDistance <- factor(myDistance)

That creates a factor with three levels, and in the regression

glm (mark~x+poly(elevation,2)+ distance*myDistance ...)

will give the change values for the  intercepts and slopes for the segments.

If you want the coefficients of the 3 separate lines, run it like this instead:


glm (mark~-1 + x+poly(elevation,2)+ myDistance / distance ...)


Also, I'd mention that the package "segmented" is available, but it
has the added feature that it will try to estimate the breakpoints
from the data.


PJ
#
On Mon, Mar 24, 2008 at 01:09:35PM -0500, Paul Johnson wrote:
I think you should try Simon Wood's "Generalized Additive Models: An
introduction in R."  2006 Chapman and Hall. 

I reviewed it recently and I think it's really very good.

Cheers

Andrew